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MOLECULAR TYPING OF GROUP B STREPTOCOCCI 
Field of the invention 

The present invention relates to molecular methods of typing group B 
5 streptococci, as well as polynucleotides useful in such methods. 

Background to the invention 

Group B streptococcus (GBS) - Streptococcus agalactiae - is the 
commonest cause of neonatal and obstetric sepsis and an increasingly important 

10 cause of septicaemia in the elderly and immunocompromised patients. The 
incidence of neonatal GBS sepsis has been reduced in recent years by the use of 
intrapartum antibiotic prophylaxis, but there are many problems with this 
approach. In future, vaccination is likely to be preferred and there has been 
considerable progress in development of conjugate polysaccharide GBS 

15 vaccines. 

Before the introduction of conjugate vaccines, extensive epidemiological 
and other related studies will be required to assess, not only the burden of 
disease, but also the distribution of GBS types (including capsular polysaccharide 
gene serotypes, serosubtypes; protein antigen gene subtypes; mobile genetic 

20 element subtypes) to determine the optimal formulation of vaccine antigens. Type 
distribution based on one geographic location or small numbers of patients may 
not be generally applicable. Continued monitoring will be necessary to assess the 
suitability of combinations of GBS vaccine antigens for different target 
populations in different geographic locations. 

25 Nine capsular polysaccharide GBS serotypes have been described 

(Harrison et al., 1998; Hickman et al., 1999). Various serotyping methods have 
been used, including immuno-precipitation (Wilkinson and Moody, 1969), enzyme 
immunoassay (Holm and Hakansson, 1988), coagglutination (Hakansson et al., 
1992), counter-immunoelectrophoresis, and capillary precipitation (Triscott and 

30 Davies, 1 979), latex agglutination (Zuerlein et al., 1 991 ), fluorescence microscopy 
(Cropp et al., 1974) and inhibition-ELISA (Arakere et al., 1999). These methods 
are labour-intensive and require high-titered serotype-specific antisera, which are 
expensive and difficult to make and commercially available for only six serotypes 
- la to V (Arakere et al., 1999). Molecular genotyping methods, such as pulsed- 

35 field gel electrophoresis (Rolland et al., 1999), restriction endonuclease analysis 
(Nagano et al., 1991) are useful for epidemiological studies but do not generally 
identify serotypes. Consequently, there is a need for a reliable molecular method 
for GBS serotype identification. 
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Summary of the invention 

We have identified specific regions within the genome of group B 
streptococci of inter-type sequence heterogeneity that can be used to distinguish 
5 different types (including capsular polysaccharide gene serotypes and 
serosubtypes; protein antigen gene subtypes; and mobile genetic element 
subtypes). We have shown that molecular methods that detect these sequence 
heterogeneities can be used to accurately distinguish and type group B 
streptococci. 

Accordingly in a first aspect the present invention provides a method of 
typing a group B streptococcal bacterium which method comprises analysing the 
nucleotide sequence of one or more regions within the cpsD, cpsE, cpsF, cpsG, 
cpsl/M genes of said bacterium, said region(s) comprising one or more 
nucleotides whose sequence varies between types. 

In particular, the nucleotide sequence may be analysed for one or more 
positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 204, 211, 
281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645^ 
803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 
1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

10 Preferably at least one region is within a sequence delineated by the 3' 

136 bases of the cpsE gene and the 5' 218 bases of the cpsG gene of the cpsE- 
cpsF-cspG gene cluster of said group B streptococcal bacterium. In particular, 
the nucleotide sequence may be analysed for one or more positions 
corresponding to positions 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 

15 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1. 

In one embodiment, at least one region is within the cpsl/M genes of said 
group B streptococcal bacterium. 

We have also shown that a number of surface protein antigen genes, 

20 including rib, alp2 or alp3 genes, and five mobile genetic elements may be used 
to molecular subtype GBS. Accordingly, the present invention also provides a 
method of typing a group B streptococcal bacterium which method comprises 
determining the presence or absence in the genome of said bacterium of one or 
more surface protein antigen genes selected from a rib, alp2 or alp3 gene, and/or 

25 one or more mobile genetic elements selected from IS867, IS 1548, \S1381, 
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ISSa4 and GBSM . Preferably, such as method is combined with the above 
methods of the invention. 

The nucleotide sequence analysis step may comprise sequencing said 
one or more regions. Alternatively, or in addition, the nucleotide sequence 
5 analysis step may comprises determining whether a polynucleotide obtained from 
said bacterium selectively hybridises to a polynucleotide probe comprising one or 
more of the said regions, preferably to one or more of a plurality of polynucleotide 
probes corresponding to one or more of the said regions. 

In a preferred embodiment, where hybridisation to a plurality of probes is 
10 used as a means of analysis, the plurality of polynucleotide probes are present as 
a microarray. 

In another embodiment, the nucleotide sequence analysis step comprises 

an amplification step using one or more primers, at least one of which hybridise 

specifically to a sequence which differs between types. Typically, primer pairs 
15 are used, at least one of which hybridise specifically to a sequence which differs 

between types. Preferably, said primers are selected from the primers shown in 

Table 2 and/or Table 6 and/or Table 10. 

In a second aspect, the present invention provides a polynucleotide 

consisting essentially of at least 10 contiguous nucleotides corresponding to a 
20 region within a cpsD-cpsE-cpsF-cpsG gene of a group B streptococcal bacterium, 

said polynucleotide comprising one or more nucleotides which differ between 

GBS types. 

Preferably the nucleotides which differ between GBS types correspond to 
one or more of positions 62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240, 249, 
300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 
1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1 . 

The present invention also provides a polynucleotide consisting essentially 
of at least 10 contiguous nucleotides corresponding to a region within a sequence 
25 delineated by the 3' 1 36 base pairs of cpsE and the 5' 21 8 base pairs of cpsG of 
the cpsE-cpsF-cspG gene cluster of a group B streptococcal bacterium, said 
polynucleotide comprising one or more nucleotides which differ between GBS 
types. 

Preferably the nucleotides which differ between group B streptococcal 
30 types correspond to one or more of positions 1413, 1495, 1500, 1501, 1512, 
1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1.. 
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The present invention also provides a polynucleotide consisting essentially 
of at least 10 contiguous nucleotides corresponding to a region within a cpsl/M 
gene of a group B streptococcal bacterium, said polynucleotide comprising one or 
more nucleotides which differ between group B streptococcal types. 
5 Preferably the polynucleotide is selected from the nucleotide sequences 

shown in Table 2. 

The present invention further provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region within 
a rib, alp2 or alp3 gene of a group B streptococcal bacterium, said polynucleotide 
10 comprising one or more nucleotides which differ between GBS protein antigen 
gene subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 6. 

The present invention further provides a polynucleotide consisting 
15 essentially of at least 10 contiguous nucleotides corresponding to a region within 
\S861, \S1548, \S1381, ISSa4 and/or GBSil of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between GBS mobile genetic element subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
20 shown in Table 1 0. 

The polynucleotides of the invention may be used in a method of typing, 
such as serotyping and/or subtyping, a group B streptococcal bacterium. 

In a third aspect the present invention provides a composition comprising a 
plurality of polynucleotides of the second aspect of the invention. The 
25 composition may be used in a method of typing, such as serotyping and/or 
subtyping, a group B streptococcal bacterium. 

In a fourth aspect the present invention provides a microarray comprising a 
plurality of polynucleotides according to the second aspect of the invention. The 
microarray may be used in a method of typing, such as serotyping and/or 
30 subtyping, a group B streptococcal bacterium. 

Detailed description of the invention 

Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the 
35 art (e.g., in cell culture, molecular genetics, nucleic acid chemistry, hybridization 
techniques and biochemistry). Standard techniques are used for molecular, 
genetic and biochemical methods (see generally, Sambrook et a/., Molecular 
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Cloning: A Laboratory Manual, 3 rd ed. (2001) Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, N.Y. and Ausubel et a/., Short Protocols in Molecular 
Biology (1999) 4 th Ed, John Wiley & Sons, Inc. - and the full version entitled 
Current Protocols in Molecular Biology, which are incorporated herein by 
5 reference) and chemical methods. 

The molecular typing methods of the present invention rely on detecting 
the presence in sample of specific polynucleotide sequences in regions of the 
genome of group B streptococci (GBS) that we have identified as varying 
between different types. 

10 More specifically, the specific polynucleotide sequences that are to be 

detected lie within cpsD, cpsE, cpsF, cpsG, cpsl, cpsM, rib, alp2 and/or alp3 
genes of GBS as well as mobile genetic elements \S861, \S1548 and \S1381, 
ISSa4 and GBSil , preferably the cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. 
Regions of interest within those genes mentioned are regions whose 

15 sequence varies between two or more types, i.e. are heterogenous. 
Heterogeneity may be due to insertions, deletions and/or substitutions between 
corresponding regions in different types. In the case of rib, alp2 and a/p3, 
heterogeneity typically takes the form of the presence or absence of the entire 
gene. Similarly for elements \S861, \S1548, \S1381, ISSa4 and GBSil 

20 heterogeneity typically takes the form of the presence or absence of the entire 
sequence. 

Specific regions of heterogeneity include the following positions within 
cpsD gene- 62 and 78-86; cpsD-cpsE gene spacer - 138, 139 and 144; cpsE 
gene - 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 
25 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 
1495, 1500, 1501, 1512, 1518 and 1527; cpsF gene - 1595, 1611, 1620, 1627, 
1629, 1655, 1832, 1856, 1866, 1871, 1892 and 1971; and cpsG gene - 2026, 
2088, 2134, 2187 and 2196 (numbering corresponds to numbering shown in 
Figure 1 ). 

30 Particularly preferred positions of interest are those that lie within a 790 bp 

fragment of cpsE-cps-F-cpsG (which consists of approximately the 3' 136 bases 
of cpsE to the 5' 218 bases of cpsG), namely positions 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

35 Another region of heterogeneity is position 62 of cpsD and a repetitive 

sequence (TTACGGCGA) found at positions 78 to 86 of cpsD in some but not all 
GBS serotypes. 
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Specific regions of heterogeneity also include a number of positions within 
the cpsl/M gene as shown in the sequence alignment depicted in Figure 3. 

These regions of heterogeneity may be analysed using a variety of means 
including sequencing, PCR and binding of labelled probes. 

In the case of sequencing to identify serotype, the sequencing primers are 
selected such that they hybridise specifically to a region within or near to a region 
within which a region of heterogeneity is present. The primers need not be 
specific to particular serotypes since the actual sequence information obtained 
during the sequencing process which is used to assign molecular serotype. Thus 
the primers may hybridise specifically to all GBS serotypes (at least serotypes la 
to VII), or to specific serotypes. 

Preferred primers anneal within 100, 50 or 20 contigous nucleotides of a 
heterogeneous position within the 790 bp region of cpsE-cpsF-cpsG shown in 
Figure 1. Examples of suitable sequencing primers are shown in Table 2 
(cpsES3, cpsFA, cpsFS, cpsGA and cpsGM). 

PCR and other specific hybridisation- based serotyping methods will 
typically involve the use of nucleotide primers/probes which bind specifically to a 
region of the genome of a GBS serotype which includes a nucleotide which varies 
between two or more serotypes. Thus the primers/probes may comprise a 
sequence which is complementary to one of such regions. Where positions of 
heterogeneity are close together (e.g. positions 198, 204, 21 1 and 218 of cpsE), it 
may be desirable to use a primer/probe which hybridises specifically to a region 
of the GBS genome that comprises two or more positions of heterogeneity. Thus 
for example, a primer/probe may be designed that is complementary to 
nucleotides 195 to 220 of cpsE. Such primers/probes are likely to have improved 
specificity and reduce the likelihood of false positives. 

PCR-based methods of detection may rely upon the use of primer pairs, at 
least one of which binds specifically to a region of interest in one or more, but not 
all, serotypes. Unless both primers bind, no PCR product will be obtained. 
Consequently, the presence or absence of a specific PCR product may be used 
to determine the presence of a sequence indicative of specific GBS serotypes. 
However, as mentioned, only one primer need correspond to a region of 
heterogeneity in the genes of interest (such as the cpsD, cpsE, cpsF, cpsG, cpsl 
and/or cpsM genes). The other primer may bind to a conserved or heterogenous 
region within said gene or even a region within another part of the GBS genome, 
such as the cpsH gene, whether said region is conserved or heterogeneous 
between serotypes. Thus, for example, a combination of a primer (cpsGS) which 
binds to a region of the cpsG gene including positions 2172 to 2210, and a primer 
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which binds to a region of cpsH gene which is heterogeneous (lacpsHAI, 
MIcpsHA), may be used as the basis of distinguishing serotypes (la and III). 

Further, a primer which binds to a region of cpsl which is heterogeneous 
may be combined with a primer which binds to a region of cpsG which is 
constant. An example of such as primer pair is primer pair VIcpsIA, and cpsGSI, 
which give rise to a PCR product of 1517 bp and GBS serotype VI specific. 

Alternatively, primers that bind to conserved regions of the GBS genome 
but which flank a region whose length varies between serotypes may be used. In 
this case, a PCR product will always be obtained when GBS bacteria are present 
but the size of the PCR product varies between serotypes. 

Furthermore, a combination of specific binding of one or both primers and 
variations in the length of PCR primer may be used as a means of identifying 
particular molecular serotypes. 

Examples of specific primers/probes which target the cpsD, cpsE, cpsF, 
cpsG, cpsl or cpsM genes include the following: 



cpsDS GCA AAA GAA CAG ATG GAA CAA AGT GG 

cpsES CTT TTG GAG TCG TGG CTA TCT TG 

cpsEAl GA/T/GA AAA AAG GAA AGT CGT GTC G/ATT G 

20 cpsESI CTT GGA CHTC CTC TGA AAA GGA TTG 

cpsEA2 AAA A/CGC TTG ATC AAC AGT TAA GCA GG 

cpsES2 GAT GGT/C GGA CCG GCT ATC TTT TCT C 

cpsEA3 CTT AAT TTG TTC TGC ATC TAC TCG C 

cpsES3 GTT AGA TGT TCA ATA TAT CAA TGA ATG GTC TAT TTG GTC AG 

25 cpsEFA CCT TTC AAA CCT TAC CTT TAC TTA GC 

cpsFS CAT CTG GTG CCG CTG TAG CAG TAC CAT T 

cpsFA GTC GAA AAC CTC TAT A/GT A AAC/T GGT CTT ACA A/GCC AAA 

TAA CTT ACC 

cpsGA AAG/C AGT TCA TAT CAT CAT ATG AGA G 

30 cpsGAI CCG CCA/G TGT GTG ATA ACA ATC TCA GCT TC 

cpsGS ATG ATG ATA TGA ACT CTT ACA TGA AAG AAG CTG AGA TTG 

cpsGS 1 GAA CTC TTA CAT GAA AGA AGC TGA GAT TGT TAT CAC AC 

IbcpsIA CTA TCA ATG AAT GAG TCT GTT GTA GGA CGG ATT GCA CG 

IbcpsIS GAT AAT AGT GGA GAA ATT TGT GAT AAT TTA TCT CAA AAA 

35 GAC G 

IbcpslAI CCT GAT TCA TTG CAG AAG TCT TTA CGA TGC GAT AGG TG 

IVcpsMA GGG TCA ATT GTA TCG TCG CTG TCA ACA AAA CCA ATC AAA TC 

VcpsMA CCC CCC ATA AGT ATA AAT AAT ATC CAA TCT TGC ATA GTC AG 
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VIcpsIA GAA GCA AAG ATT CTA CAC AGT TCT CAA TCA CTA ACT CCG 
cpsIA GTA TAA CTT CTA TCA ATG GAT GAG TCT GTT GTA GTA CGG 

The primer designations correspond to those given in Table 2. 
5 In relation to the a/p2, a/p3 and rib surface protein antigen genes, 

heterogeneity and protein antigen gene subtype is assessed more at the level of 
whether a group B streptococcal bacterium contains the gene or not. Our results 
show that the specific combination of surface proteins genes present in a GBS 
genome is indicative of serotype/serosubtypes (see Table 9). Consequently, 

TO primers/probes suitable for use in the methods of the present invention are those 
that are specific for the particular genes. Thus probes/primers that are specific 
for a/p2 or a/p3 or rib are preferred. Figure 4 shows an alignment of a/p2 and 
a/p3 that was used to design primers specific for a/p2 or specific for a/p3. 

Examples of specific primers/probes which target the a/p2, a/p3 and rib 

15 genes include the following: 

bcaS1 GGT AAT CTT AAT ATT TTT GAA GAG TCA ATA GTT GCT GCA TCT 
AC 

bcaS2 CCAGGGA GTG CAG CGA CCT TAA ATA CAA GCA TC 
20 balS GAT CCT CAA AAC CTC ATT GTA TTA AAT CCA TCA AGC TAT TC 
balA CCA GTT AAG ACT TCA TCA CGA CTC CCA TCA C 
bal23S1 CAG ACT GTT AAA GTG GAT GAA GAT ATT ACC TTT ACG G 
bal23S2 CTT AAA GCT AAG TAT GAA AAT GAT ATC ATT GGA GCT CGT G 
bal2S CTT CCG CCA GAT AAA ATT AAG 
25 bal2A CTG TTG ACT TAT CTG GAT AGG TC 

bal2A1 CGT GTT GTT CAA CAG TCC TAT GCT TAG CCT CTG GTG 
bal2A2 GGT ATC TGG TTT ATG ACC ATT TTT CCA GTT ATA CG 
bal3S GTT CTT CCG CTT AAG GAT AG 

bal3A GAC CGT TTG GTC CTT ACC TTT TGG TTC GTT GCT ATC C 
30 ribS2 GAAGTAATTTCAG GAA GTG CTG TTA CGT TAA ACA CAA ATA TG 
ribA1 GAA GGT TGT GTG AAA TAA TTG CCG CCT TGC CTA ATG 
ribA2 AAT ACT AGC TGC ACC AAC AGT AGT CAA TTC AGA AGG 
The primer designations correspond to those given in Table 6. 

In relation to the \S861, \S1548, \S1381, ISSa4 and GBSM, heterogeneity 
35 and subtype is assessed more at the (evel of whether a group B streptococcal 
bacterium contains the element or not. The number of elements may also be 
assessed. Our results show that the specific combination of mobile elements 
present in a GBS genome is indicative of serotype/serosubtype (see Table 12). 
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Consequently, primers/probes suitable for use in the methods of the present 
invention are those that are specific for the particular mobile genetic elements. 
Thus probes/primers that are specific for \S861, \S1548, \S1381, ISSa4 and 
GBSil are preferred. 

5 Examples of specific primers/probes which target \S861, \S1548, \S1381, 

ISSa4 and GBSil include the following: 

IS861 S GAG AAA ACA AGA GGG AGA CCG AGT AAA ATG GGA CG 
IS861 A1 CAC GAT TTC GCA GTT CTA AAT AAA TCC GAC GAT AGC C 

10 IS861 A2 CAA ACT CCG TCA CAT CGG TAT AGC ACT TCT CAT AGG 
IS1 548S CTA TTG ATG ATT GCG CAG TTG AAT TGG ATA GTC GTC 
IS1 548S1 GTT TGG GAC AGG TAG CGG TTG AGG AGA AAA GTA ATG 
IS1 548A1 CAT TAC TTT TCT CCT CAA CCG CTA CCT GTC CCA AAC 
IS1 548A2 CCC AAT ACC ACG TAA CTT ATG CCA TTT G 

15 IS1548A3 CGT GTT ACG AGT CAT CCC AAT ACC ACG TAA CTT ATG CC 

151 381 51 CTT ATG AAC AAA TTG CGG CTG ATT TTG GCA TTC ACG 

151 381 52 GGC TCA GGC GAT TGT CAC AAG CCA AGG GAG 
IS1 381 A CTA AAA TCC TAG TTC ACG GTT GAT CAT TCC AGC 
ISSa4S CGT ATC TGT CAC TTA TTT CCC TGC GGG TGT CTC C 

20 ISSa4A1 GCC GAT GTC ACA ACA TAG TTC AGG ATA TAG CCA G 
ISSa4A2 CGT AAA GGA GTC CAA AGA TGA TAG CCT TTT TGA ACC 
GBSil S1 CAT CTC GGA ACA ATA TGC TCG AAG CTT ACA AGC AAG TG 
GBSil S2 GGG GTC ACT ATC GAG CAG ATG GAT GAC TAT CTT CAC 
GBSil A1 AAT GGC TGT TTC GCA GGA GCG ATT GGG TCT GAA CC 

25 GBSil A2 CCA GGG ACA TCA ATC TGT CTT GCG GAA CAG TAT CG 

Preferably, the primers/probes comprise at least 10, 15 or 20 nucleotides. 
Typically, primers/probes consist of fewer than 100, 50 or 30 nucleotides. 
Primers/probes are generally polynucleotides comprising deoxynucleotides. They 

30 may also be polynucleotides which include within them synthetic or modified 
nucleotides. A number of different types of modification to oligonucleotides are 
known in the art. These include methylphosphonate and phosphorothioate 
backbones, addition of acridine or polylysine chains at the 3' and/or 5' ends of the 
molecule. For the purposes of the present invention, it is to be understood that 

35 the polynucleotides described herein may be modified by any method available in 
the art. Primers/probes may be labelled with any suitable detectable label such 
as radioactive atoms, fluorescent molecules or biotin. 
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In one embodiment, primers/probes have a high melting temperature of 
>70°C so that they may be used in rapid cycle PCR. 

Compositions comprising a plurality of nucleotides that are used to analyse 
one or more regions within the cpsD, cpsE, cpsF, cpsG, or cpsl/M genes may 
5 also further comprise nucleotides that may be used to analyse one or more 
regions within the cpsH gene. Suitable nucleotides are described in the 
Examples, although a person skilled in the art could design other suitable 
sequences based on the sequence alignment shown in Figure 3. 

Further, compositions comprising a plurality of nucleotides that are used to 
10 analyse one or more regions within alp2, alp3 or rib genes may also further 
comprise nucleotides that may be used to analyse one or more regions within the 
C alpha (oca) and C beta (bac) genes (C beta gene also known as bag). 

A variety of techniques may be used to analyse one or more regions within 
the genome of a bacterium of interest. Typically, a sample of interest, which is 
15 suspected of containing GBS bacteria is treated, using standard techniques to 
obtain genomic DNA from any microorganisms present in the sample. It may be 
desirable for a number of subsequent detection steps to use nucleic acid 
preparation techniques that result in substantial fragmentation of the genomic 
DNA. The sample may be from a bacterial culture or a clinical sample from a 
20 patient, typically a human patient. Clinical samples may be cultured to produce a 
bacterial culture. However, it is also possible to test clinical samples directly with 
a culturing step. 

The genomic DNA is then subjected to one or more analysis steps which 
may include sequencing, enzymatic amplification and/or hybridisation. These 
25 general techniques of DNA analysis are known in the art and are discussed in 
detail in, for example, Sambrook et al. 2001 and Ausubel et al. 1999 supra. 

Serotyping may involve a one or more steps. For example, it may be 
desirable to carry out an initial step of determining whether there are nucleotide 
sequences present in the sample which are conserved between GBS seroptypes 
30 but not found in any other organism. This may be achieved by using PCR 
primers that detect any (but only) GBS bacteria (e.g. using primer pairs 
Sag59/Sag1 90 and/or DSF2/DSR1 - see Tables 2 and 3). 

Molecular serotyping for specific GBS serotypes can then be performed by 
detecting the presence of one or more regions of heterogeneity in the regions of 
35 interest using any suitable technique such as sequencing, enzymatic 
amplification and/or hybridisation based on the probes/primers discussed above. 

A particularly preferred detection technique is PCR, such as rapid cycle 
PCR (Kong etal., 2000). 
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An example of a multi-step serotyping strategy (algorithm) is shown in 
Figure 2. However, a variety of other strategies are envisaged and can be 
designed by the skilled person using the sequence heterogeneity information 
presented herein. In particular, it is preferred that the serotyping procedure 
5 comprise at least one analysis step based on analysing one or regions of the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. This analysis may optionally be 
combined with an analysis of one or more regions within the cpsH gene. Similar 
techniques may be used to analyse the cpsH gene regions and suitable primer 
sequences and methods are also described in the Examples. 

10 Analysis of the presence of absence of the alp2, alp3 and/or rib genes may 

optionally be combined with an analysis of the presence or absence of C alpha 
(oca gene), C beta (oac) gene sequences as is described in the Examples. 
Similar techniques may be used to analyse these regions and suitable primer 
sequences and PCR methods are also described in the Examples. 

15 Furthermore, analysis of the presence of absence of the a/p2, alp3 and/or 

rib genes (and optionally the oca and bac genes) may be combined with an 
analysis of the presence or absence of mobile genetic elements. 

Thus a typing strategy may involve an analysis of cps genes, surface 
protein genes and/or mobile genetic elements in various combinations to provide 

20 more serosubtyping and subtyping information. 

Analysis of GBS genomic sequences using the above techniques may 
take place in solution followed by standard resolution using methods such as gel 
electrophoresis. However in a preferred aspect of the invention, the 
primers/probes are immobilised onto a solid substrate to form arrays. 

25 The polynucleotide probes are typically immobilised onto or in discrete 

regions of a solid substrate. The substrate may be porous to allow immobilisation 
within the substrate or substantially non-porous, in which case the probes are 
typically immobilised on the surface of the substrate. Examples of suitable solid 
substrates include flat glass (such as borosilicate glass), silicon wafers, mica, 

30 ceramics and organic polymers such as plastics, including polystyrene and 
polymethacrylate. It may also be possible to use semi-permeable membranes 
such as nitrocellulose or nylon membranes, which are widely available. The semi- 
permeable membranes may be mounted on a more robust solid surface such as 
glass. The surfaces may optionally be coated with a layer of metal, such as gold, 

35 platinum or other transition metal. 

Preferably, the solid substrate is generally a material having a rigid or 
semi-rigid surface. In preferred embodiments, at least one surface of the 
substrate will be substantially flat, although in some embodiments it may be 
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desirable to physically separate synthesis regions for different polymers with, for 
example, raised regions or etched trenches. It is also preferred that the solid 
substrate is suitable for the high density application of DNA sequences in discrete 
areas of typically from 50 to 100 ^m, giving a density of 10000 to 40000 crrf 2 . 
5 The solid substrate is conveniently divided up into sections. This may be 

achieved by techniques such as photoetching, or by the application of 
hydrophobic inks, for example teflon-based inks (Gel-line, USA). Discrete 
positions, in which each different probes are located may have any convenient 
shape, e.g., circular, rectangular, elliptical, wedge-shaped, etc. 

10 Attachment of the library sequences to the substrate may be by covalent 

or non-covalent means. The library sequences may be attached to the substrate 
via a layer of molecules to which the library sequences bind. For example, the 
probes may be labelled with biotin and the substrate coated with avidin and/or 
streptavidin. A convenient feature of using biotinylated probes is that the 

15 efficiency of coupling to the solid substrate can be determined easily. Since the 
polynucleotide probes may bind only poorly to some solid substrates, it is often 
necessary to provide a chemical interface between the solid substrate (such as in 
the case of glass) and the probes. Thus, the surface of the substrate may be 
prepared by, for example, coating with a chemical that increases or decreases 

20 the hydrophobicity or coating with a chemical that allows covalent linkage of the 
polynucleotide probes. Some chemical coatings may both alter the hydrophobicity 
and allow covalent linkage. Hydrophobicity on a solid substrate may readily be 
increased by silane treatment or other treatments known -in the art. Examples of 
suitable chemical coatings include polylysine and poly(ethyleneimine). Further 

25 details of methods for the attachment of are provided in US Patent No. 6,248,521 . 
Methods for immobilizing nucleic acids by introduction of various functional 
groups to the molecules are also described in Bischoff et a/., 1987 (Anal. 
Biochem., 164:336-3440 and Kremsky et a/., 1987 (Nucl. Acids Res. 15:2891- 
2910). 

30 Techniques for producing immobilised arrays of nucleic acid molecules have 

been described in the art. A useful review is provided in Schena et a/., 1998, 
TibTech 16: 301-306, which also gives references for the techniques described 
therein. 

Microarray-manufacturing technologies fall into two main categories— 
35 synthesis and delivery. In the synthesis approaches, microarrays are prepared in 
a stepwise fashion by the in situ synthesis of nucleic acids from biochemical 
building blocks. With each round of synthesis, nucleotides are added to growing 
chains until the desired length is achieved. A number of prior art methods describe 
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how to synthesise single-stranded nuqleic acid molecule libraries in situ, using for 
example masking techniques (photolithography) to build up various permutations of 
sequences at the various discrete positions on the solid substrate. U.S. Patent No. 
5,837,832 describes an improved method for producing DNA arrays immobilised to 
5 silicon substrates based on very large scale integration technology. In particular, 
U.S. Patent No. 5,837,832 describes a strategy called "tiling" to synthesize specific 
sets of probes at spatially-defined locations on a substrate which may be used to 
produced the immobilised DNA libraries of the present invention. U.S. Patent No. 
5,837,832 also provides references for earlier techniques that may also be used. 

10 The delivery technologies, by contrast, use the exogenous deposition of 

preprepared biochemical substances for chip fabrication. For example, DNA may 
also be printed directly onto the substrate using for example robotic devices 
equipped with either pins (mechanical microspotting) or piezo electric devices (ink 
jetting). In mechanical microspotting, a biochemical sample is loaded into a 

15 spotting pin by capillary action, and a small volume is transferred to a solid 
surface by physical contact between-the pin and the solid substrate. After the first 
spotting cycle, the pin is washed and a second sample is loaded and deposited to 
an adjacent address. Robotic control systems and multiplexed printheads allow 
automated microarray fabrication. Ink jetting involves loading a biochemical 

20 sample, such as a polynucleotide into a miniature nozzle equipped with a 
piezoelectric fitting and an electrical current is used to expel a precise amount of 
liquid from the jet onto the substrate. After the first jetting step, the jet is washed 
and a second sample is loaded and deposited to an adjacent address. A 
repeated series of cycles with multiple jets enables rapid microarray production. 

25 In one embodiment, the microarray is a high density array, comprising 

greater than about 50, preferably greater than about 100 or 200 different nucleic 
acid probes. Such high density probes comprise a probe density of greater than 
about 50, preferably greater than about 500, more preferably greater than about 
1 ,000, most preferably greater than about 2,000 different nucleic acid probes per 

30 cm . The array may further comprise mismatch control probes and/or reference 
probes (such as positive controls). 

Microarrays of the invention will typically comprise a plurality of 
primers/probes as described above. The primers/probes may be grouped on the 
array in any order. However, it may be desirable to group primers/probes 

35 according to types (capsular polysaccharide gene serotypes, serosubtypes; 
protein antigen gene subtypes; mobile genelic elements subtypes), or groups of 
types (capsular polysaccharide gene serotypes, serosubtypes; protein antigen 
gene subtypes; mobile genelic elements subtypes) for which they are specific. 
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Such grouping may be arranged such that the resulting patterns are easily 
susceptible to pattern recognition by computer software. 

Elements in an array may contain only one type of probe/primer or a 
number of different probes/primers. 
5 Detection of binding of GBS genomic DNA to immobilised probes/primers 

may be performed using a number of techniques. For example, the immobilised 
probes which are specific to a number of types (capsular polysaccharide gene 
serotypes, serosubtypes; protein antigen gene subtypes; mobile genelic elements 
subtypes), may function as capture probes. Following binding of the genomic 

10 DNA to the array, the array is washed and incubated with one or more labelled 
detection probes which hybridise specifically to regions of the GBS genome 
which are conserved. The binding of these detection probes may then be 
determined by detecting the presence of the label. For example, the label may 
be a fluorescent label and the array may be placed in an X-Y reader under a 

15 charge-coupled device (CCD) camera. 

Other techniques include labelling the genomic DNA prior to contact with 
the array (using nick-translation and labelled dNTPs for example). Binding of the 
genomic DNA can then be detected directly. 

It is also possible to employ a single PCR amplification step using labelled 

20 dNTPs. In this embodiment, the genomic DNA fragment binds to a first primer 
present in the array. The addition of polymerase, dNTPs, including some labelled 
dNTPs and a second primer results in synthesis of a PCR product incorporating 
labelled nucleotides. The labelled PCR fragment captured on the plate may then 
be detected. 

25 A number of available detection techniques do not require labels but 

instead rely on changes in mass upon ligand binding (e.g. surface plasmon 
resonance- SPR). The principles of SPR and the types of solid substrates 
required for use in SPR (e.g. BIACore chips) are described in Ausubel et a/., 
1999, supra. 

30 

C, Uses 

As discussed above, group B streptococcus (GBS) - Streptococcus 
agalactiae - is the commonest cause of neonatal and obstetric sepsis and an 
increasingly important cause of septicaemia in the elderly and 
35 immunocompromised patients. Thus, the detection methods, probes/primer and 
microarrays of the invention may be used in the diagnosis of GBS infections in 
pregnant women, elderly and/or immunocompromised patients. The PCR and 



PCT/AU02/01281 



15 

microarray techniques described herein may be of particular use in routine 
antenatal screening of pregnant women as well as in diagnosing infections in 
pregnant women given the increased accuracy and sensitivity compared to 
conventional identification and serotyping. These methods are also likely to give 
5 faster results since it will not generally be necessary to culture clinical samples to 
obtain enough material. Further, the molecular techniques can be used in most 
laboratories without the need for specialist expertise or reagents. 

The molecular typing methods of the invention may also assist in 
comprehensive strain identification that will be useful for epidemiological and 
10 other related studies that will be needed to monitor GBS isolates before and after 
introduction of GBS conjugate vaccines. 

The present invention will now be described in more detail with reference 
to the following examples, which are illustrative only and non-limiting. The 
15 examples refer to Figures: 

Detailed description of the Figures. 

Figure 1 . Molecular serotype identification based on the sequence heterogeneity 
20 of the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of cpsG (relevant primers are 
shown). 

Figure 2. Algorithm for GBS molecular serotype (MS) identification by PCR and 
sequencing. 

25 

Figure 3. Multiple sequence alignments of the gene sequences of cpsG-cpsH- 
cpsl/M for serotypes la, lb, II, III, IV, V and VI (start and stop codons are 
highlighted in bold). 

30 Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, 
upper lines) and alp3 (AF291065, lower lines) used to distinguish them (relevant 
primers are shown). 

Figure 5. Genetic relationship of 194 invasive Australasia GBS strains (or 56 
35 genotypes). 

Notes for column headed "Genetic Markers of GBS genotypes": 
Protein antigen gene profile codes are: 
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"A": 5'end of bca positive; 

"a" or "as": bca repetitive unit or bca repetitive unit-like region positive, 
with multiple or single band amplicons, respectively; 
"B": bac positive; 
5 "R": rib positive; 

"alp2": alp2 positive; 
"alp3": alp3 positive; 

"None": isolate contains none of the above protein genes. 
The molecular markers in bold type show the common features in each cluster. 

10 

Notes for column headed "No. of strains": 

After "+" are the numbers of CSF isolates, the others are blood isolates. 
Notes for column headed "Genotypes": 

Each genotype was characterized by a distinct combination of the cps 
15 genes, protein gene profiles and mobile genetic elements. The predominant 
genotype in each serotype were named as the number "1" genotype of that 
serotype. 

Notes for the dendrogram: 

At about distance 16, the 56 genotypes could be separated into 8 clusters 
20 (1 -8); at about distance 22.5 the 56 genotypes could be separated into 3 cluster 
groups (A, B, C). 

EXAMPLES 

25 MATERIALS AND METHODS 

GBS reference strains and clinical isolates. 

A panel of nine GBS serotypes (la to VIII) was kindly provided by Dr 
Lawrence Paoletti, Channing Laboratory, Boston USA (reference panel 1). Dr 

30 Diana Martin, Streptococcus Reference Laboratory, at ESR, Wellington, New 
Zealand, provided another panel of nine international reference GBS type-strains 
including serotypes la to VI (reference panel 2) (Table 1). In addition, we tested 
isolates from 205 clinical cases including 146 which had been referred from 
various laboratories in New Zealand for serotyping and 59 isolated from normally 

35 sterile sites over a period of 10 years in one diagnostic laboratory in Sydney. One 
culture was subsequently shown to be mixed, so 206 different isolates were 
examined. Conventional serotyping (CS) was performed at the Streptococcus 
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Reference Laboratory, at ESR, Wellington, New Zealand, and MS at the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 
Australia. 

The two panels of GBS reference strains and 63 selected clinical isolates 
5 were studied in more detail, by sequencing >2200 base pairs (bp) of each to 
identify appropriate sequences for use in MS. These and the remaining clinical 
isolates were then used to evaluate the MS method and compare results with 
those of CS. Typing by both methods was done initially without knowledge of 
results of the other. 

10 Bacterial isolates were retrieved from storage by subculture on blood agar 

plates (Columbia II agar base supplemented with 5% horse blood) and incubated 
overnight at 37°C. 

Invasive GBS clinical isolates 

15 Al1 194 isolates used in the study of mobile genetic elements were 

recovered from the blood (177) or CSF (17) of 191 patients (107 female, 80 male, 
four sex unrecorded; three cultures each contained mixed growth of two GBS 
serotypes). 108 isolates were from specimens submitted for culture to the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 

20 Australia during 1996-2001 and 83 were referred to Institute of Environmental 
Science and Research (ESR), Porirua, Wellington, New Zealand for serotyping, 
from various diagnostic laboratories in New Zealand, during 1994-2000. 

Patients were classified into age-groups for analysis of genotype 
distribution as follows: neonatal, early onset (0-6 days); neonatal, late onset (7 

25 days to 3 months); infant and child (4 months-14 years); young adult (15-45 
years); middle-aged (46-60 years); elderly (>60 years). 

These isolates are mainly a subset of the isolates described above but 
with reference strains and non-invasive isolates excluded. 

30 Conventional serotyping (CS). 

CS was performed using standard methodology (Wilkinson and Moody, 
1969). Briefly, an acid-heated (56°C) extract was prepared for each isolate and 
the serotype determined by immuno-precipitation of type-specific antiserum in 
agarose. An isolate was considered positive for a particular serotype when the 
35 precipitation occurring formed a line of identity with that of the control strain. 
Antisera used were prepared at ESR in rabbits against serotypes la, lb, Ic, II, III, 
IV, V and the R protein antigen. Fourteen selected isolates, including six that 
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were nontypable using antisera against serotypes l-V, six that initially gave 
discrepant results between CS and MS and two separate isolates from a mixed 
culture, were kindly tested using antisera against all serotypes by Abbie Weisner 
and Dr Androulla Efstratiou at Central Public Health Laboratory, Colindale, 
5 London, UK. 

Molecular serotype identification (MS); development of method. 

Oligonucleotide primers. 

The oligonucleotide primers used in this study, their target sites and 

10 melting temperatures are shown in Tables 2, 6 and 10. Their specificities and 
expected lengths of amplicons are shown in Tables 3, 7 and 1 1 . The primers 
were synthesised according to our specifications by Sigma-Aldrich (Castle Hill 
NSW, Australia). Four previously published oligonucleotide primers, and a series 
of new primers designed by us were used to sequence the genes of interest, 

15 namely 16S/23S rRNA intergenic spacer region and partial cps gene cluster, or to 
amplify unique sequences of individual GBS serotypes. Six previously published 
oligonucleotide primers and a series of new primers designed by us were used to 
sequence parts of and/or to specifically amplify genes encoding GBS surface 
proteins. We also designed a series of primers to sequence parts of and/or to 

20 specifically amplify five known GBS mobile genetic elements. Some were 
designed with high melting temperatures (>70°C) to be used in rapid cycle PCR. 

DNA preparation and polymerase chain reaction (PCR). 

Five individual GBS colonies or a sweep of culture were sampled using a 

25 disposable loop and resuspended in 1 ml of digestion buffer (10mM Tris-HCI, pH 
8.0, 0.45% Triton X-100 and 0.45% Tween 20) in 2 ml Eppendorf tubes. The 
tubes containing GBS suspension were heated at 100°C (dry block heater or 
water bath) for 10 minutes then quenched on ice and centrifuged for 2 minutes at 
14,000 rpm to pellet the cell debris. 5 nL of each supernatant containing 

30 extracted DNA was used as template for PCR (Mawn et al., 1993). 

PCR systems (25nL for detection only, 50 nL for detection and 
sequencing) were used as previously described (Kong et al., 1999). The 
denaturation, annealing and elongation temperatures and times used were 96°C 
for 1 second, 55-72°C (according to the primer Tm values or as previously 

35 described) for 1 second and 74°C for 1 to 30 seconds (according to the length of 
amplicons), respectively, for 35 cycles. 
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10 nl_ of PCR products were analysed by electrophoresis on 1.5 % 
agarose gels, which were stained with 0.5 \xg ethidium bromide mL" 1 . For 
detection and/or serotype identification, the presence of PCR amplicons of 
expected length, shown by ultraviolet transillumination, were accepted as 
5 positive. For sequencing, 40 \iL volumes of PCR products were further purified by 
polyethylene glycol precipitation method (Ahmet et al., 1999). 

Sequencing. 

The PCR products were sequenced using Applied Biosystems (ABI) Taq 
10 DyeDeoxy terminator cycle-sequencing kits according to standard protocols. The 
corresponding amplification primers or inner primers were used as the 
sequencing primers. 

Multiple sequence alignments and sequence comparison. 
15 Multiple sequence alignments were performed with Pileup and Pretty 

programs in Multiple Sequence Analysis program group. Sequences were 
compared using Bestfit program in Comparison program group. All programs are 
provided in WebANGIS, ANGIS (Australian National Genomic Information 
Service), 3 rd version. 

20 

Surface protein gene profile codes 

Each isolate was given a protein gene profile code according to positive 
PCR results using various primer pairs, as shown in Table 7. 

25 Nucleotide sequence accession numbers. 

The new sequence data described have been submitted to the GenBank 
Nucleotide Sequence Databases and allocated the following accession numbers: 
AF291411-AF291419 (16S/23S rRNA intergenic spacer regions for serotypes la 
to VIII reference strains from reference panel 1); AF332893-AF332917, 

30 AF363032-AF363060, AF367973, AF381030 and AF381031 (partial cps gene 
clusters for two panels of reference strains (Table ) and selected representative 
clinical isolates); AF367974 (partial bac gene sequence, with an insertion 
sequence IS 7387 from one isolate), AF362685-AF362704 (partial bac gene 
sequences for all oac-positive isolates) and AF373214 (partial rib-like gene for 

35 reference strain Prague 25/60, an R protein standard strain). 

Previously reported sequence data referred to herein have appeared in the 
GenBank Nucleotide Sequence Databases with the following accession numbers: 
AB023574 (16S rRNA gene); U39765, L31412 (16S/23S rRNA intergenic spacer 
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regions); X68427 (S. oralis 23S rRNA gene); X72754 (cfb gene); AB028896 (cps 
gene duster for serotype la); AB050723 (partial cps gene cluster for serotype lb); 
AF 163833 (cps gene cluster for serotype 111); AF355776 (cps gene cluster for 
serotype IV); AF349539 {cps gene cluster for serotype V); AF337958 (cps gene 
5 cluster for serotype VI); M97256 (bca gene); X58470, X59771 (bac gene); 
U58333 (rib gene); AF208158 (a/p2 gene), AF291065-AF291072 (alp3 gene); 
AF064785 (\S1381); M22449 (IS86f); Y1 4270 (IS 1 548); AF064785 (\S1381); 
AF165983 (ISSa4); and AJ292930 (GBSil). 

io Statistical analysis and dendrogram. 

SSPS version 11 software was used for statistic analysis. A dendrogram 
was formed using Average Linkage (between groups) and Hierarchical Cluster 
Analysis in SSPS version 1 1 software. The presence or absence of each marker - 
MS la, lb, II, IV-VI , sst 111-1-4; pgp "A", "R M , 'a", "as", "alpZ, alp3"; bac subgroups 
15 1, 1a, 2, 3, 3a, 3b, 3c, 4, 4b, 5a, 7, 7a, 8, 9, 9a, 10, n1, n2; and mge \S1381, 
■ \S861, \S1548, ISSa4, GBSil - were included in the analysis. The genotypes-were 
each characterized by a distinct combination of the molecular serotyping (MS) or 
sst, pgp and mge. 

20 Example 1 - Study of inter- and intra-serotype/serosubtype sequence 
heterogeneity in specific regions of the GBS genome and assessment of 
suitability for molecular serotyping/serosubtyping. 

Polymerase chain reaction. 

25 With two exceptions, all GBS-specific primer pairs produced amplicons of 

the expected size from all reference strains and clinical isolates tested (Table 3). 
The exceptions were Sag59/Sag190 and CFBS/CFBA. Both target the cfb gene, 
but failed to produce amplicons from one clinical isolate, despite repeated 
attempts. We assumed- that this isolate either lacked the cfb gene or that the 

30 gene was present in a mutant form. It has been suggested previously that PCR 
targeting the cfb gene will not identify all GBS isolates (Hassan et al., 2000) and 
that another primer pair based on 16S rRNA gene, DSF2/DSR1 (Ahmet et al., 
1999) was not entirely specific. Therefore, in this study, we used both primer 
pairs (DSF2/DSR1 and Sag59/Sag190) to confirm all the isolates were GBS. 

35 

Sequence heterogeneity of 16S/23S rRNA intergenlc spacer regions. 

The 16S/23S rRNA intergenic spacer regions were sequenced for the 
serotypes la to VIII from reference panel 1. Multiple sequence alignment showed 
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differences between serotypes at only two positions: 207 (serotype V is T or C 
[T/C], serotypes VII and VIII are C, others are T) and 272 (serotype III is T, others 
G). These regions are therefore unsuitable for MS. 

5 Sequence heterogeneity at the 3'-end of cpsD-cpsE-cpsF-and the 5' -end of 
cpsG. 

Using a series of primers targeting the 3'-end of cpsD-cpsE-cpsF-an6 the 
5'-end of cpsG, we amplified and sequenced 2226 or 221 7 bp - depending on the 
presence or absence of a nine-base repetitive sequence - from both panels of 
10 __reference strains (serotypes la to VII) and 63 selected clinical isolates. 
Representative sequences were deposited into GenBank. See Table 1 for 
GenBank accession numbers of reference panel strains. 

Repetitive sequence. 

15 At the 3'-end region of cpsD, we found a nine-base repetitive sequence (TTA 
GGG CGA) in most isolates of MS la and II, some of MS III, all of MS IV, V, and 
VII, but none of the isolates of MS lb or VI examined. (Table 4). The presence or 
absence of this repetitive sequence can be used to further subtype MS la, II and 
III (see below). 

20 ^ 

Intra-serotype heterogeneity. 

In general, intra-serotype heterogeneity was low - there were minor random 
variations in a few isolates of all serotypes except MS III, in which the intra- 
serotype heterogeneity was more complex. MS III could be divided into four 

25 sequence subtypes on the basis of heterogeneity at 22 positions - 62, 139, 144, 
204, 300, 321, 429, 437, 457, 486, 602, 636, 971, 1026, 1194, 1413, 1501, 
1512,1518, 1527, 1629, and 2134 - and the presence or absence of the repetitive 
sequence (at 78-86) (Table 4). 

Among 60 MS III isolates (58 clinical isolates and two reference strains), 

30 serosubtypes 111-1 (30 isolates) and III-2 (22 isolates) were predominant. The 
repetitive sequence was present in serosubtype II 1-1 but not MI-2; there were 
differences at seven other sites (139, 144, 204, 300, 321, 636, and 1629). 

There were five isolates belonging to serosubtype III-3, which contained 
the repetitive sequence and were identical with serosubtype III— 1 at three variable 

35 sites (139, 144, and 300) and with serosubtype III-2 at four (204,321, 626 and 
1629). Seroubtype III-3 differed from both serosubtypes II 1-1 and III-2 at seven 
sites (486, 1026, 1413, 1512, 1518, 1527, and 2134). These seven sites in 
serosubtype III-3 were identical with the corresponding sites of MS la. 
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There were three serosubtype 1 11-4 isolates, whose sequences were nearly 
identical with the corresponding sequence of MS II. The only exception was at 
position 437, where the nucleotide was T in serosubtype 111-4 (as in MS VII), and 
C in MS II. This difference can be used (in addition to PCR, see below) to 
5 differentiate serosubtype 111-4 from MS II. Two serosubtype 111-4 isolates 
contained the repetitive sequence, and the other did not. Because of the small 
number of serosubtype 111-4 isolates, we did not use the repetitive sequence to 
subtype them further. 

10 Inter-serotype heterogeneity. 

There were 56 sites of heterogeneity between the eight MS (Table 4). The 
most suitable sites, for use in PCR/sequencing for MS, were a group of 23 sites 
nearest to the 3'-end of the region (Table 4, Figure 1). Firstly, they were 
consistent across two panels of reference strains and most clinical isolates (the 

15 only exceptions were the small number of serosubtypes 1 11-3 and 1 1 1-4 isolates, 
see below). Secondly, they were relatively concentrated within a 790 bp region, 
which is a convenient length for sequencing in a single reaction. Thirdly, they 
contained enough heterogeneity sites to allow differentiation, with few exceptions, 
of MS la-VII. Based only on this 790 bp region, serosubtype 1 1 1-3 cannot be 

20 distinguished from MS la, nor serosubtype 111-4 from MS II. However, they can be 
identified by MS Ill-specific PCR (see below). 

Serotype VIII does not form amplicons with primer pairs targeting the 790 
bp region, but can be identified by exclusion after PCR identification of GBS. In 
this study, one MS VIII isolate was identified, for which none of the primer pairs 

25 that amplify the 2226 bp region (in addition to those that amplify the 790 bp 
region) produced amplicons. This result was confirmed by the use of serotype 
Vlll-specific antiserum. 

Mixed serotype-specificities in single isolates. 

30 Eleven isolates were identified as one MS on the basis of the MS-specific 

PCR and overall , sequence (within the 2226/2217 bp segment) but their 
sequences differed at some sites from isolates of the same MS and shared site- 
specific characteristics of another. They included five serosubtype III-3 isolates 
and three serosubtype III-4 (see above). One non-serotypable reference strain 

35 (Prague 25/60), which was identified as MS II, differed from other MS II isolates 
at five sites at the 5'-end of the region, and was identical with MS III at three of 
these sites. Prague 25/60 MS Ill-specific PCR was negative. One clinical isolate 
identified as CS II, and MS II on the basis of its overall sequence, had bases at 
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nine sites at the 5'-end of the region, that were characteristic of serotype lb; MS 
lb-specific PCR was negative. Finally, one CS V reference strain (Prague 10/84) 
had the same sequencing result as the corresponding sequence in GenBank 
(AF349539), but both were different, at three sites at the 5'-end of the region, 
5 from sequences of the other MS V strains that we studied. 

All of these mixed-serotype specificities, except for those associated with 
serosubtypes III-3 and III-4, occurred at the 5'-end region of the 2226/2217 
fragment. This supported our selection of the 790 bp 3'-end as the sequencing 
target for MS. Using this target, all MS were correctly identified except for MS III 
10 belonging to serosubtypes 111-3 and 111-4, which can be identified by MS Ill- 
specific PCR (see Example 2). 

Example 2 - Molecular serotype identification (MS) based on MS-specific 
PCR targeting the 3'-end of cpsG-cpsH-cps 1/cpsM. 

15 0ur sequence alignment results showed that there was significant 

sequence heterogeneity in the 3'-end of cpsG-cpsH-cps 1/cpsM (Figure 3), which 
makes it appropriate for use in the design of specific primer pairs for 
differentiation of serotypes la, lb, III, IV, V, and VI directly by PCR. To fulfil 
possible additional future requirements - for example, development of multiplex 

20 PCR and/or to allow further evaluation of the sequence typing method, we 
designed several primer pairs for each serotype (Tables 2 & 3). Using two panels 
of reference strains and the specified conditions, all primer pairs amplified DNA 
only from the corresponding serotypes. When clinical isolates were tested, similar 
results were obtained with two sets of MS-specific primer pairs. In general, more 

25 stringent conditions (lower primer concentration, higher annealing temperatures) 
could be used with primers generating smaller amplicons. Those selected for MS 
are shown in Table 3 and Figure 2. 

A MS was assigned, by PCR, to 179 of 206 (86.9%) clinical isolates as 
follows: MS la 40; MS lb 35; MS III 58 (including those previously identified as 

30 serosubtypes III-3 and III-4); MS IV 7; MS V 36; MS VI 3. 
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Example 3 - Comparison of serotype identification results between MS 
and CS. 

After CS and MS had been completed, the results were compared. Initial 
results were discrepant for 15 isolates, all but five of which (see below) were 
5 resolved by retesting and/or correction of clerical errors. 

The CS and MS/sequence subtyping results are shown in Table 5. A MS 
was assigned to all isolates by PCR and/or sequencing, compared with 188 of 
206 (91.3%) by CS. Specific PCR has not yet been developed for MS II and VIII, 
so all MS II isolates were determined by sequencing only and one presumptive 
10 MS VIII isolate was decided by exclusion (see Example 1). For all other isolates, 
the results of PCR and sequencing were consistent, except for serosubtypes III-3 
and III-4 and other minor sequence differences described above (Example 1). CS 
results correlated well with PCR results. 

Final CS and MS results were the same for all 188 isolates (100%) for 
15 which results for both methods were available. Eighteen clinical isolates that were 
non-serotypable by CS, were assigned MS as follows: la, two; lb, five; II, one; 
serosubtype 111-1, three; serosubtype III-2, one; V, five; and VI, one. 

Sequences (2217 bp) of three clinical isolates that we identified as MS VI, 
were identical with those for serotype VI reference strains and the corresponding 
20 sequence in GenBank (AF337958). 

Mixed culture. 

Four clinical isolates gave positive results with MS Ill-specific PCR, but 
were provisionally identified as MS II by sequencing. Three were CS III and one 

25 CS II, with a weak cross-reaction with serotype III antiserum. These isolates were 
studied further by subculturing 12 individual colonies of each. All subcultures 
were tested by MS Ill-specific PCR. All 12 colony subcultures of the three CS III 
isolates were positive by MS Ill-specific PCR and the isolates were therefore 
classified as serosubtype III-4 (see above). However, 11 of 12 colony subcultures 

30 of the fourth isolate were negative by MS Ill-specific PCR; and one was positive 
by MS Ill-specific PCR. It was therefore assumed that this was a mixed culture, 
predominantly of MS/CS II. The one MS Ill-specific PCR positive colony was 
subsequently identified as serosubtype III-2 and included as an additional clinical 
isolate (total 206 in all). 

35 
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Example 4 - Algorithm for serotype assignment of GBS by PCR and 
sequencing 

As an example of how the PCR and sequencing methods described above 
may be used clinically to perform GBS serotype identification, we designed an 
5 algorithm for clinical use. All the primers (except the inner sequencing primers) 
used were given high melting temperature (>70°C), so rapid cycle PCR could be 
used (Figure 2) (see Table 2 for primer sequences). 

Example 5 - Identification of regions in the a/p2, a/p3 and rib genes suitable 
10 for protein antigen gene specific subtyping 

Polymerase chain reactions. 

With few exceptions, all primer pairs produced amplicons of predicted 
length from isolates giving positive results (Table 7). The exceptions included one 
isolate that was positive by PCR using primer pairs GBS1360S/GBS1937A and 

15 GBS1717S/GBS1937A (which both target bac gene) but produced amplicons 
significantly longer than those of other bac gene-positive isolates. Sequencing 
showed that the amplicon contained the insertion sequence \S1381 with minor 
variations compared with the published sequences (Tamura et al., 2000). The 
amplicons produced using primers IgAagGBS/RlgAagGBS and lgAS1/lgAA1 

20 (also targeting oac gene) varied in length (Berner et al., 1999) and were 
sequenced for further subtyping (see below and Table 8). 

Amplicon sequencing results. 

To confirm the specificity of selected primer pairs that we had designed or 
25 modified, we sequenced 10 of 23 amplicons produced by bcaS1/bcaA (targeting 
the 5'-end of oca gene) and all of those produced by ribS1/ribA3 (targeting rib 
gene) and GBS1360S/GBS1937A (targeting bac gene), from the two panels of 
reference strains and 31 randomly selected clinical isolates. . 

All 10 amplicons of primers bcaS1/bcaA and 12 of 13 of primers 
30 ribS1/ribA3 were identical with the corresponding gene sequences in GenBank 
(M97256, oca gene and U58333, rib gene, respectively). One additional isolate, 
namely Prague 25/60 in reference panel 2 (which is used to raise R antiserum), 
produced an amplicon with primer pair ribS1/ribA3 only at a lower annealing 
temperature (55 °C) but not with ribS2/ribA1 and ribS2/ribA2. It was therefore 
35 assumed not to contain rib gene, although the amplicon sequence showed 
considerable homology with rib gene (71 .4% or 66.6% according to whether or 
. not the primer sequences were included) (Figure 3). This isolate was the only 
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one, of 224 tested, for which PCRs were negative using ribS2/ribA1 and 
ribS2/ribA2 but positive using ribS1/ribA3. The latter primer pair is assumed to be 
not entirely specific for rib gene and was therefore used only for sequencing. 

Four of 10 amplicons of primer pair GBS1360S/GBS1937A (targeting bac 
gene) were identical with the corresponding sequence in GenBank (X58470, 
X59771). A single point mutation (A to G, 1441 of X59771) was found in the 
remaining six oac gene amplicons, including the one which contained the 
insertion sequence \S1381 (see above and AF367974). 

Amplicons from all of the 224 isolates that gave positive PCR results using 
primer pairs bcaS1/balA (targeting alp2/alp3 genes), bal23S1/bal2A2 (targeting 
alp2 gene) and IgAagGBS/RlgAagGBS (targeting bac gene) were sequenced. 

Fifty isolates produced amplicons using primer pair bcaS1/balA The 
sequences of nine were identical with the corresponding portions of the published 
sequence of a/p2 gene (AF208158) and 41 with that of a/p3 gene (AF291065). 
There are two consistent heterogeneity sites between alp2 and alp3 genes in the 
sequences of bcaS1/balA amplicons (Figure 4), which can be used to distinguish 
them, in addition to alp2 and alp3 gene -specific PCR. All nine amplicons of 
primer pair bal23S1/bal2A2 were identical with the corresponding portion of the 
alp2 gene sequence in GenBank (AF2081 58). 

The primer pair IgAagGBS/RlgAagGBS identified bac gene in 52 isolates. 
There was considerable sequence variation, which allowed separation of bac 
gene -positive isolates into 11 groups and 20 subgroups based on amplicon 
length and sequence heterogeneity, respectively (Table 8). The groups contained 
small numbers (one to five) of isolates except for B1 (20 isolates, 2 subgroups) 
and B4 (11 isolates, 3 subgroups). The differences in amplicon length was 
generally caused by the presence or absence of short repetitive sequences. 

Further confirmation of specificity of surface protein gene-specific primer 
pairs. 

To confirm primer specificity, we compared the results of PCR using the 
primer sequences we had designed or modified for bac gene PCR, with those of 
PCR using previously published primers and found 100% correlation. 

The previously reported non-specificity of the published primer pair 
bcaRUS/bcaRUA (targeting the bca gene repetitive unit) was confirmed. Using 
these primers, all nine alp2 gene positive (bcaSI/bcaA negative) isolates and 53 
which were PCR negative using the primers bcaS1/bcaA, bcaS2/bcaA (targeting 
the 5'-end of oca gene), bal23S1/bal2A2 and bal23S2/bal2A1 (targeting the 5*- 
end of a/p2 gene) produced amplicons. Our sequencing showed that bca gene 
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and a/p2 gene have significant homology in the regions targeted by bcaRUS/ 
bcaRUA allowing amplicon formation from a/p2 gene -positive strains. These 
false positive results could be due to the presence of other C alpha-like proteins, 
containing regions homologous with the oca gene repetitive unit (bca gene 
5 repetitive unit-like sequence). 

We also showed that the results of PCR using two or more primer pairs that 
we had designed for individual genes (no, a/p2, and alp3 genes) correlated well, 
supporting the specificity of each set. The only exception, as mentioned above, 
was ribS1/ribA3, which produced a non-specific amplicon from one of 224 
10 isolates tested. 

Example 6 - The relationship between surface protein antigen gene profiles 
and cps serotypes/serosubtypes. 

Surface protein gene profiles. 

For each gene (except oca gene repetitive unit or bca gene repetitive unit- 
like region), we selected two primer pairs to identify and characterise GBS 
surface protein by PCR. Each isolate was given a protein gene profile code 
according to PCR results as follows: 

"A": 5'end of oca gene amplified by bcaS1/bcaA and bcaS2/bcaA; 
"a" or "as": bca gene repetitive unit or bca gene repetitive unit-like region 
amplified by bcaRUS/bcaRUA, with multiple or single band amplicons, 
respectively; 

"B": bac gene amplified by GBS1360S/GBS1937A and 
IgAagGBS/RlgAagGBS (>20 subgroups based on sequence 
25 heterogeneity). 

"R": rib gene amplified by ribS2/ribA1 and ribS2/ribA2; 
"alp2": alp2 gene amplified by bal23S1/bal2A2 and bal23S2/bal2A1 and 
"alp3": alp3 gene amplified by bal23S1/bal3A and bal23S2/bal3A 
(Table 7). 

30 Four common profiles accounted for 203 of 224 (90.6%) isolates: "R" (62 

isolates), "AaB" (51 isolates), "a" (49 isolates) and "alp3" (41 isolates) (see 
Table 4). Only two isolates contained no surface protein gene markers. All but 
one isolate with the bac gene ("B") also had oca gene, with its repetitive unit 
("Aa"); one had rib gene. All "a^" isolates contained single bca repetitive unit- 

35 like sequences ("as"). "A", -R", "alpZ and "alp3" were all mutually exclusive. 62 of 
63 isolates with rib gene ("R") and 41 of 41 isolates with alp3 gene had no other 
protein antigen markers. 



* 
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The relationship between surface protein antigen gene profiles and cps 
serotypes/serosubtypes. 

A cps molecular serotype (MS) was assigned to all isolates in accordance 
5 with the methods described in Examples 1 to 4 and the results correlated with 
conventional serotyping (CS) results except for 19 of 224 isolates that were 
nontypable using antisera. The relationship between surface protein gene profiles 
and cps MS are summarised in Table 9. 

The following strong associations were confirmed or demonstrated 
10 between: MS la and bca gene repetitive unit or oca gene repetitive unit-like 
sequence (most with profile "a"); MS serosubtypes 111-1 and III-2 and rib gene; MS 
serosubtype III-3 and a/p2 gene; MS lb and bcafoac genes and MS V and alp3 
gene. MS II showed the most varied surface protein gene profiles. However, the 
relationships were not absolute and different combinations of cps serotypes and 
15 protein gene profiles produced 31 different serovariants or 51 when bac gene 
("B") subgroups were considered. 

Example 7 - The relationship between surface protein antigens and protein 
gene profiles. 

20 Based on conventional serotyping, 33 isolates (belonging to CS la/c, Ib/c, lie, lib, 
lllc or lllb) reacted with the C antiserum. The surface protein gene profiles of all 
these isolates contained oca gene ("A") or oca gene repetitive unit-related 
markers ("a" or "as"): Aa, 3; AaB, 18; a, 11; alp2as,1. Twenty nine isolates 
reacted with the R antiserum and, of these, 22 contained rib gene and six, alp3 

25 gene. The strain used to raise the R protein antiserum (Prague 25/60) contained 
a presumed no-like gene (see above and Figure 3). 

Example 8 - Identification of mobile genetic elements suitable for molecular 
subtyping 

30 We developed a series of PCR primers to screen for the presence of five 

mobile elements in GBS serotypes. 

Specificity of primers pairs. 

All the primer pairs produced amplicons of the expected lengths (Table 1 1 ) 
35 from some reference and/or some clinical isolates (Table 12). To evaluate the 
specificity of our primer pairs, we sequenced all amplicons produced by primers 
IS1548S/IS1548A3 and ISSa4S/ISSa4A2, and amplicons, selected from both 
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reference and clinical isolates, produced by IS861S/IS861A2 (12 isolates), 
IS1381S1/IS1381A (24 isolates) and GBSMS1/GBSMA2 (11 isolates). 

All 41 \S1548 and 15 ISSa4 amplicon sequences were identical with the 
corresponding sequences in GenBank (Y14270 and AF1 65983, respectively). 
5 Five of 12 \S861 amplicon sequences were identical with the corresponding 
\S861 sequence in GenBank (M22449). The other seven differed, at position 732, 
from the published sequence (G to A) and the reference strain Prague 25/60 had 
two additional differences - G to A and T to A - at positions 576 and 830 of 
M22449, respectively. 

10 Previously, we found a full-length insertion sequence \S1381 (AF367974) 

within C beta antigen gene of a clinical isolate, with several differences compared 
with the original published sequence (AF064785): the terminal inverted repeats 
contained 15, rather than 20 base pairs (bp); there was a three bp deletion and 
four individual bp differences in the putative transposase pseudogene between 

15 positions 419 to 429 (of the original GenBank sequence) - GGG ATC CGA TT 
(AF064785) vs CAG A- -GG TA (AF367974; our sequence). All amplicons of 
primer pair IS1381S1/IS1381Afrom 12 reference and 12 selected clinical isolates 
were identical with each other and with that of our \S1381 sequence in GenBank 
(AF367974) but different, as above, from the original reported IS1381 sequence 

20 (AF064785). 

The amplicons of primer pair GBSi1S1/GBSi1A2 from all four GBSM- 
positive reference strains and seven selected clinical isolates were sequenced. 
Six (including those of three reference strains) were identical with the 
corresponding GBSM sequence in GenBank (AJ292930). Amplicons from four 

25 clinical isolates showed three site-variations (C to T at position 767, A to C at 
position 846 and T to C at position 923 of AJ292930 sequence). The reference 
strain Prague 25/60 showed only the first two of these site-variations. 

In addition to sequencing, we evaluated the specificity of our primer pairs 
by comparing PCR results for two or more primer pairs for each target (Table 11). 

30 In all cases, the same sets of isolates gave positive results when tested with PCR 
targeting the same mobile genetic elements, thus confirming the specificity of the 
primer pairs. 

PCR results using specific primer pairs for all five mobile genetic elements. 

35 \S861, \S1548, \S1381, \SSa4 and GBSM were identified in 55%, 18%, 85%, 

7% and 19% of isolates, respectively. None of the mobile elements was detected in 
10 (4%) isolates. The distributions of the five mobile elements identified by PCR in 
the 224 GBS isolates tested in the previous examples are shown in Table 12. \S1381 
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was detected alone in 79 isolates and GBSM alone in one. Forty-six isolates 
contained two different insertion sequences (\S861 and \S1381, 42 isolates ; IS 1 548 
and \S1381, three isolates; ISSa4 and IS 1381, one isolate). Forty-four isolates 
contained three (IS861, \S1548 and \S1381 34; \S861, ISSa4 and IS 1381, 10) and" 
5 one contained all four insertion sequences. Forty-one isolates contained GBSM in 
combination with one (IS867, 22; \S1381, one isolate) two (\S861 and \S1381, 11; 
ISSa4 and \S1381, three isolates) or three (\S861, \S1548 and \S1381, four isolates) 
insertion sequences. 

10 PCR results for the 194 invasive isolates using specific primer pairs for all 
five mobile genetic elements - . 

The numbers of isolates containing different mobile genetic elements (mge) 
combinations (from none to four per isolate) are shown in Table 13. IS1381, IS861, 
IS1548, ISSa4 and GBSM were identified in 87%, 52%, 17%, 6% and 18% of 
15 isolates, respectively. Six (3%) isolates contained no mge. 

Example 9 - The relationships between cps serotypes, serosubtypes, surface 
protein gene profiles and mobile genetic elements. 

The distribution of each of the five mobile genetic elements in different cps 
20 serotypes, serotype III subtypes and surface protein gene profiles are shown in 
Tables 12 and 13. The most consistent findings for each sero/serosubtype were: 

1 ) Serotype la - most (>80%) expressed proteins that closely related with C alpha 
protein and contained IS1381 

2) Serotype lb - most (>90%) expressed C alpha and C beta proteins and 
25 contained IS861 and IS1381 

3) Serotype II - exhibited two common patterns: 

a) >50% expressed C alpha protein (and often C beta) and contained IS861, 
IS1381 and sometimes other mobile elements, especially ISSa4 or 

b) >25% expressed Rib protein and contained IS861 , IS1381 and GBSM 

30 4) Serosubtype 111-1 - all expressed Rib protein and contained IS861 , IS1 548 and 
IS1381 but not GBSM. 

5) Serosubtype III-2 - all expressed Rib protein and contained IS861 and GBSM 
but neither IS1 548 nor IS1 381 . 

6) Serosubtype III-3 - all expressed C alpha-like protein 2 and contained no 
35 mobile genetic elements. 

7) Serosubtype III-4 - expressed various proteins; all contained GBSM . 



PCT/AU02/01281 

31 

8) Serotype IV - most expressed proteins that closely related with C alpha protein 
and contained IS1 381 

9) Serotype V - most expressed C alpha-like protein 3 contained IS1 381 

10) GBSil and IS1548 were mutually exclusive in serotype III (111-1, III-2 and III-4) 
5 but not in serotype II 

11) All isolates that expressed C alpha-like protein 2 contained no insertion 
sequences. 

Predominant relationships between MS/sst, pgp and mge. 

10 Figure 5 shows the relationships between the various genetic markers. IS1381 

was present in nearly all isolates of MS la, lb, IV, V and VI, but in none of sst III-2 or 
III-3. IS1548 was found exclusively, and GBSil most commonly, in serotypes II or III; 
three isolates (all MS II) contained both GBSil and IS1548. IS861 was found in all sst 
1 1 1-1 and III-2 and most MS II and lb isolates but only in 14% of other MS isolates. 

15 ISSa4 was present in only 6% of isolates, more than half of which were MS II; it was 
present in one invasive isolate obtained before 1996 (1994). IS1381 was found in 
most isolates except those in cluster 8, pgp "alp2", which had no insertion sequences. 
IS861was found in most genotypes with pgp "AaB" (clusters 3 and 4) and all 
genotypes with pgp "R" (clusters 6 and 7). 

20 

Genotypes based on MS/sst, pgp, bac subtypes and mge. 

MS/sst, pgp, bac subtype (for isolates with pgp "B") and the presence of 
various combinations of mge provide a PCR/sequencing-based genotyping system. 
The 194 invasive isolates in this study represented seven serotypes, ten MS/sst, 41 
25 subtypes based on the distributions of pgp and mge or 56 genotypes when bac 
subtypes (mainly in MS lb) were included (Figure 5). 

Theoretical GBS clonal population structure. 

Theoretically there are 13 possible GBS MS/sst (eight MS - la, lb, II, IV-VIII, 
30 four sst III 1-4 and cps gene cluster absent) and at least 10 pgp (none, "Aa", "AaB", 
"a", "as", "R", "RB", "alp2as", "alp3" or "alp4a"). If the 22 bac subgroups identified so 
far are included, there are up to 31 pgp. If the five mge were independently, randomly 
distributed and present or absent, there would be 13x31x2 5 = 12,896 different possible 
combinations of molecular markers. The fact that only 56 different combinations were 
35 found (Figure 5), demonstrates that markers are not randomly distributed or, in other 
words, these invasive Australasian GBS isolates have a clonal population structure. It 



32 



is possible, but unlikely, that these isolates represent a very limited number of GBS 
genotypes. 

The phylogenetic relationship of Australasian invasive GBS. 

The 56 genotypes formed eight clusters, separated at a genetic distance of 
about -16 (or three cluster groups separated at a distance of -22.5). The pgp 
was the main determinant of cluster separation (Figure 5). 94% of isolates 
belonged to five MS (la, lb, II, III and V), 62% belonged to five (9%) genotypes 
(la-1 , lb-1 , 111-1 , III-2, V-1 ) and 92% belonged to the five largest clusters (1 , 2, 4, 6 
and 7). Cluster group A, the largest, contained 139 (72%) isolates and 48 (86%) 
genotypes, 45 of which contained fewer than five isolates, whereas cluster group 
B contained 49 (25%) isolates and five (9%) genotypes. 

The main characteristics of each duster were as follows: 
Cluster 1. "alp3", IS1381 (39 isolates, four MS, 11 genotypes; predominant 
genotype V-1 ). 

Cluster 2: "a" or "as", IS1381 (55 isolates, four MS, 12 genotypes, predominant 
genotype la-1). 

Cluster 3: "Aa" or "AaB", MS II, IS1381, IS 861 (10 isolates, six genotypes). 
Cluster 4: "AaB", IS1381, IS861 (35 isolates, two MS: VI or lb; 18 genotypes; 
predominant genotype lb-1 ). 

Cluster 5. "AaB", IS861 , GBSM , genotype 111-4-1 (one isolate). 

Cluster 6: "R", IS861 and GBSil (22 isolates, three MS/genotypes; predominant 

genotype III-2). 

Cluster 7: "R", IS1381 and IS861 (27 isolates; two MS/genotypes; predominant 
genotype 1 1 1-1). 

Cluster 8: "alp2as", no IS (six isolates; three MS/genotypes; one contained 
GBSil). 

The phylogenetic study showed that the dendrogram inferred by SSPS 
was very robust. 

The relationship between genotypes and GBS disease patterns. 

The distribution of MS and genotypes in different age groups of patients with 
invasive GBS disease is shown in Table 14. All common MS were represented in 
more than one patient group. However, there were highly significant associations 
(when compared with all other age-groups) between sst III-2 and late onset neonatal 
infection (p=0.0005) and MS V and infection in the elderly (p=0.001). 
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There were 17 isolates from cerebrospinal fluid specimens, nine (53%) of 
which were MS III (from three different sst/genotypes, each in a different cluster). The 
other eight isolates were distributed among five MS, seven genotypes and four 
clusters. Meningitis occurred in all age-groups but comprised 23% of cases in the late 
5 onset neonatal group compared with 5% in all other groups. 

DISCUSSION 

Capsule production in GBS is controlled by capsular polysaccharide 
synthesis (cps) gene cluster, which had been sequenced for serotype la and 

10 serotype III before we began our study. Corresponding sequences for serotype lb 
(Miyake et ai, 2001 submitted into GenBank, GenBank accession number: 
AB050723), and for serotypes IV, V, and VI (McKinnon et al., 2001 submitted into 
GenBank, GenBank accession numbers: AF355776, AF349539, AF337958, 
respectively) were released recently when the project was nearly finished but 

15 those for the other three serotypes (II, VII and VIII), the sequences of cps gene 
clusters, have not been published previously. 

The sequences of cps gene clusters for serotypes la, and III showed 
considerable homology at the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of cpsG. 
We designed a series of primers to amplify a 2226/2217 bp segment in this 

20 region and found that amplicons were obtained from all serotypes except VIII. 
This confirmed a previous suggestion that serotype VIII is significantly different 
from other serotypes in this region. 

Using eight serotype (la to VII) reference strains, we showed more than 50 
heterogeneity points between serotypes (Figure 1, Table 4). Using 63 selected 

25 clinical isolates that had been serotyped by conventional methods, we found that 
these inter-serotype differences were generally consistent and specific, especially 
the 23 sites clustered at the 3'-end of the regions. We used these differences to 
assign serotypes to the remaining clinical isolates collected in this study, without 
knowledge of the serotype obtained by conventional methods. 

30 Sequence analysis of the 3'-end of cpsG-cpsH-cpsl/cpsM for serotypes la, 

III, lb, IV, V and VI showed that this region is highly variable (Figure 3), making 
this region a suitable target for direct serotype identification by PCR. We 
designed several pairs of MS-specific primers for MS la, lb, III, IV, V and VI and 
used them to test two CS reference panels. Selected primer pairs were used for 

35 MS, by PCR alone, of 86.9% of our 206 clinical isolates. Using rapid-cycle MS- 
specific PCR, results are available within one working day. In future, it will be 
possible to extend this method to all MS, when cps gene cluster sequences in 
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this region are available for serotypes II, VII and VIM. Meanwhile, MS II and VII 
can be identified by sequencing the 790 bp PCR amplicons of the 3"-end of cpsE- 
cpsF-the 5'-end of cpsG (Figure 1, Table 4). A positive GBS-specific PCR and 
negative PCR results with all the primers that amplify the 790 bp, identified MS 
5 VIII, by exclusion. 

In future, and in some laboratories currently, sequencing of the 790 bp 
PCR amplicons of the 3'-end of cpsE-cpsF-ihe 5'-end of cpsG for all isolates may 
be more convenient, as only one method and fewer primers are needed. 
However, if sequencing is not available in-house, the turn-around time is longer 

10 and a small proportion of serotypes would be wrongly assigned (serosubtypes III- 
3 and 111-4 as MS la and II, respectively). This could be avoided by screening with 
MS Ill-specific PCR first. Sequencing the 790 bp PCR amplicon, allows MS III to 
be subtyped on the basis of the sequence heterogeneity. 

Previous studies have shown that serotypes la, lb, II, III, and V are those 

15 most frequently isolated from normally sterile sites, in the United States and 
several countries. Serotypes VI and VIII are the predominant serotypes isolated 
from patients in Japan, but are uncommon elsewhere. Although our isolates were 
selected, they were probably representative of those causing disease in 
Australasia; la, lb, II, III, and V were the most common serotypes identified, 

20 although there were small numbers of serotypes IV, VI and, VIII. 

Up to 13 % of GBS isolates are non-serotypable and in our study the 
proportion was 8.7% (18/206) using the antisera available. This may be due to 
decreased type-specific-antigen synthesis; non-encapsulated phase variation; or 
insertion or mutation in genes of cps gene clusters. One non-serotypable strain 

25 GBS in our study had a T base deletion in cpsG gene, which caused a change in 
the cpsG gene reading frame. 

We have also developed PCR-based methods to identify GBS surface 
protein genes and further characterise these isolates. Using the published bac 
gene sequence, we modified bac gene-specific primers and designed new 

30 primers, with high melting temperatures (>70 °C) suitable for rapid cycle PCR 
targeting all major surface protein genes. 

As previously reported, a published PCR primer pair targeting the bca 
gene repetitive unit (at the 3' -end of bca gene), was not entirely specific for bca 
gene. We designed two new primer pairs targeting the 5'-end of bca gene, to 

35 improve the specificity. However, very few serotype la strains gave positive 
results using these primers whereas all were PCR positive using primers 
targeting the oca gene repetitive unit. These results were consistent with a 
previous report, that a probe targeting the 5'-end of bca gene hybridized with only 
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one of nine serotype la strains, but a large bca gene probe, including the tandem 
repeat region, hybridized with all nine strains. 

PCR specific for rib, alp2 and alp3 genes has not been described 
previously. The primer pairs we designed mainly targeted the 5'-ends of the gene 
5 and were chosen after comparing the gene heterogeneity with related gene 
sequences. We designed two or more primer pairs for each gene to check primer 
specificity by comparison of results of different PCR targeting the same genes. 
Protein gene profiles u alp2" and "alp3"were distinguished on the basis of the a/p2 
and a/p3 gene -specific PCR and/or two sequence heterogeneity sites in the 

10 ampliconsof bcaS1/balA, or bcaS27 balA. 

To confirm the specificity of our primers, we used them to examine two 
reference panels and selected GBS isolates. The longest amplicons produced by 
PCR for each gene were sequenced, to provide maximal sequence information 
and ensure that the inner primers were not located at strain heterogeneity sites. 

15 Our sequencing results confirmed the specificity of the primers. Two pairs of 
primers for each gene were compared, with similar results. Finally, six 
gene/region specific primer pairs (including the one targeting the oca gene 
repetitive unit) were used to define protein antigen gene profiles for all 224 
isolates. 

20 The study showed that only one member of the surface protein gene family 

containing repetitive sequences - rib, bca, alp2, and alp3 genes-could be present 
in any single isolate. However, all isolates containing bac gene, which is not a 
member of the surface protein gene family containing repetitive sequences, also 
contained either oca gene (51/52) or rib gene (1/52). 

25 Bac gene was present in 23% of isolates, a similar proportion to that (19- 

22%) previously reported. In common with others, we found variations in the bac 
gene due to variable small internal repetitive sequences. These bac gene 
repetitive sequences were irregular (unlike those of the bca-rib gene family). 
Their role is not clear, but they are potentially useful molecular markers for 

30 epidemiological studies, 

Our data show that some serotype III isolates (our MS serosubtypes 111-1 
and III-2) were closely associated with rib gene, and others (our MS serosubtype 
III-3) with alp2 gene. Serotype lb was associated with bca and bac genes and 
serotype V with alp3 gene. However, as the relationship was not absolute, 

35 different combinations of cps serotypes-serosubtypes/protein gene profiles 
identified many serovariants, which will be useful in epidemiological studies and 
in formulation of conjugate vaccines. Based on PCR only, we were able to divide 
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our 224 isolates into 31 serovariants based on bac gene (B) groups or 51 , based 
on subgroups. Theoretically, there are likely to be additional serovariants. 

We found that the antisera to "c" and "R" protein antigens were not entirely 
specific for any particular protein genes. However, reaction with "c" antiserum 
5 mostly reflected the presence of genes encoding C alpha (oca gene) and related 
protein antigens (at least including a/p2 gene) and the antiserum to "R" with those 
encoding Rib (rib gene) and related proteins (at least including alp3 gene, and the 
rare presumed rib-Wke gene). 

We have also investigated the presence of a number of mobile element in 

10 different serotypes of GBS. Four different insertion sequences have been 
identified previously in GBS. Multiple copies of \S861 in some serotype III 
isolates were associated with increased capsule gene expression. We found 
\S861 in all serosubtypes 111-1 and III-2 and most serotype II and lb isolates but 
few others. All IS867-containing isolates contained at least one additional mobile 

15 element. 

Multiple copies of \S1381 have been found in a high proportion GBS and 
other Streptococcus species, including S. pneumoniae and used as probes for 
restriction fragment length polymorphism (RFLP) analysis of GBS for 
epidemiological studies (Tamura et al., 2000). We found \S1381 in 85% of 

20 isolates overall. They were present in all isolates of serosubtype 1 1 1-1 but none of 
serosubtypes III-2 or III-3. Our \S1381 sequences, from 24 isolates, were identical 
with each other, but differed at several sites, from that previously described 
(AF064785). The significance of these differences is unknown, but it emphasizes 
the importance of confirming sequences from as many different strains as 

25 possible. 

ISSa4 was first identified in a nonhemolytic GBS isolate, in which it caused 
insertional inactivation of the gene cylB, which is part of an ABC transporter 
involved in production of hemolysin. Only a small proportion of (mainly hemolytic) 
GBS isolates (4%) contained ISSa4, all of which had been isolated since 1996 
30 and it was postulated that ISSa4 had been newly acquired by GBS, We also 
found ISSa4 in only a small proportion of isolates (7%) but it was present in 
similar proportions of clinical isolates obtained before (4 of 44) and during or after 
(11 of 162) 1996. 

IS 1 548 was first discovered in some hyaluronidase-negative GBS 
35 serotype III isolates, in which it caused insertional inactivation of the gene hylB 
(one of a cluster responsible for production of hyaluronidase, an important GBS 
virulence factor) (Granlund et al., 1998). A copy of IS 1548 is also found 
downstream of the C5a peptidase gene (also associated with virulence), in 
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isolates that contain it. Most IS* 548-containing isolates were from patients with 
endocarditis and it was postulated that inactivation of hyaluronidase production 
and/or some effect on C5a peptidase may allow GBS isolates to adhere to and 
survive on heart valves. 
5 We found \S1548 in all serosubtype 111-1 isolates, which represented 52% 

of 58 serotype III isolates in our collection, from superficial (eight of 12) and 
normally sterile (22 of 46) specimens. The latter were from neonates (seven of 
20), adults (three of six) and subjects of unspecified age (12 of 20) (data not 
shown). Although specific clinical data were unavailable, GBS endocarditis is 

10 uncommon and likely to have been present in few, if any, of these subjects. 
Further study is required to elucidate the association with this insertion sequence 
with specific virulence factors and clinical syndromes. 

We found GBSM, a group II intron, in 19% of our 224 isolates overall; it 
was commonly associated with \S861, and the distribution varied with 

15 serotype/serosubtype. It was rarely found in serotypes other than II and III. It was 
present in more than 50% of serotype II isolates, including four, which also 
contained IS 1548. It was found in all serosubtypes III-2 and III-4 isolates, in 
which IS 7548 was not found, but in no serosubtype 111-1 isolates which did 
contain IS 1548 or serosubtype III-3 isolates which did not. 

20 Our subdivision of GBS serotype III into four serosubtypes, based on 

differences within the cps gene cluster was supported by corresponding 
differences in surface protein gene profiles and distribution of the five mobile 
elements described in this study. Although we did not test our isolates for 
hyaluronidase activity, it is likely that our serosubtype MM, which expresses Rib 

25 protein and contains IS 1548, \S861 and IS 7387, corresponds with the 
hyaluronidase negative subtype III-2, described by Bohnsack et al., 2001. Our 
serosubtype III-2 also expresses Rib protein and contains IS867 and GBSM and 
probably corresponds with subtype III-3 of Bohnsack et al., 2001. Serosubtypes 
III-3 and III— 4 were represented by relatively few isolates. The former (in common 

30 with some serotype la isolates) expressed the C alpha-like protein 2 and 
contained no mobile elements (an otherwise uncommon finding). The latter is 
closely related to serotype II, with which it shares sequence homology in a 
section of the cps gene cluster and various surface protein profiles and mobile 
elements. 
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Summary 

Our aim has been to develop a comprehensive genotyping system for group B 
streptococcus (GBS). Such a system should ideally be reproducible, objective and, 
transportable between laboratories, comparable with and complementary to other 
typing methods and able to incorporate known virulence markers. Based on these 
criteria, we first developed a molecular serotyping (MS) method based on the cps 
gene cluster. It compared favourably with, but was more sensitive than, conventional 
serotyping (CS) and allowed us to identify several subtypes of serotype (sst) III, as 
described by others. We have also developed a second molecular subtyping method 
based on the family of genes encoding variable surface protein antigens 
(bca/rib/alp2/alp3/alp4) and the IgA binding protein C beta (bac), is more sensitive 
and objective than conventional protein serotyping, which cannot type all isolates and 
is sometimes misleading. Our methods also can identify more members of the family 
of variable antigen genes and distinguish numerous bac subgroups. A third 
subtyping method uses five mobile genetic elements (mge) including four different 
insertion sequences (IS) and a type II intron, which have been identified in GBS. The 
use of this third method further enhances the discriminatory ability of our genotyping 
system. 

We then used our typing system to examine the population genetic structure 
and age-related disease distribution of genotypes among 194 invasive GBS isolates. 

We used mainly invasive GBS isolates to demonstrate the practical value of 
our genotyping system, confirm their clonal population structure and determine the 
distribution of genotypes in different patient groups. The isolates originated from 
patients of all ages with GBS sepsis. About half were consecutive GBS isolates from 
blood or CSF, at a large diagnostic laboratory in a general adult hospital, with an 
obstetric unit (i.e there were no isolates from children other than neonates). The rest 
were consecutive isolates referred for serotyping from all over New Zealand. Thus the 
overall age distribution is representative of that in the population affected by GBS 
disease, except that children beyond the early neonatal period are probably under- 
represented. However, the distribution of genotypes within each age-group should be 
representative. 

Among our 194 Australasian invasive GBS isolates we identified 56 
genotypes, of which five (la-1 , lb-1 , 111-1 , |||-2 and V-1 ) accounted for 62% of isolates. 

The phylogenetic tree derived from our results showed relationships between 
cps serotype and protein gene profiles (pgp). Our results also show that certain 
known virulence markers - C beta, C alpha variants and hyaluronidase production 
(indirectly) - were associated with distinct clonal lineages. 
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Our genotyping system, based on three sets of genetic markers, is highly 
discriminatory. Because it provides useful phenotypic data, including antigenic 
composition, it will be useful for epidemiological surveillance of GBS, especially in 
relation to potential GBS vaccine use. Study of the relationships between 
5 putative high-virulence genotypes and patient characteristics (age and/or 
underlying risk factors), and whether there are significant differences between 
CSF isolates (or genotypes) and other invasive or colonising strains, will be 
facilitated by our genotyping system. Using this system, we have demonstrated a 
clonal population structure among invasive Australasian GBS isolates. This 

10 system will be applied to colonising GBS isolates, to identify markers of virulence. 

Thus, we have developed an alternative to conventional serotyping for 
GBS, which is accurate and reproducible, can be performed by any laboratory 
with access to PCR/sequencing and, importantly, does not require panels of 
serotype-specific antisera that are increasingly difficult to maintain. All isolates 

15 are serotypable and sequencing of a relatively limited 790 bp region can provide 
additional serosubtyping information for MS III. The molecular methods we have 
described for serotype identification, together with the protein profiling (or protein 
antigen subtyping) and identification of mobile genetic, elements (or mobile 
genetic elements subtyping) provide potentially useful markers for further 

20 phylogenetic and epidemiological studies of GBS as well as comprehensive strain 
identification that will be useful for epidemiological and other related studies that 
will be needed to monitor GBS isolates before and after introduction of GBS 
conjugate vaccines. 

The various features and embodiments of the present, referred to in 

25 individual sections above apply, as appropriate, to other sections, mutatis 
mutandis. Consequently features specified in one section may be combined with 
features specified in other sections, as appropriate. 

All publications mentioned in the above specification are herein 
incorporated by reference. Various modifications and variations of the described 

30 methods and system of the invention will be apparent to those skilled in the art 
without departing from the scope and spirit of the invention. Although the 
invention has been described in connection with specific preferred embodiments, 
it should be understood that the invention as claimed should not be unduly limited 
to such specific embodiments. Indeed, various modifications of the described 

35 modes for carrying out the invention which are readily apparent to those skilled in 
molecular biology or related fields are intended to be within the scope of the 
following claims. 
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Table 1. GBS reference panels used in this study. 

Lab strain number Source Serotype MS/ GenBank 

serosubtype accession 

. numbers 

Reference panel 1 1 



090 


Channing 


la 


la 


AF332893 


H36B 


Channing 


lb 


lb 


AF332903 


18RS21 


Channing 


II 


II 


AF332905 


M781 


Channing 


III 


III-2 3 


AF332896 


3139 


Channing 


IV 


IV 


AF332908 


CJB111 


Channing 


V 


V 


AF332910 


SS1214 


Channing 


VI 


VI 


AF332901 


7271 


Channing 


VII 


VII 


AF332913 


JM9 130013 


Channing 


VIII 


VIII 




Reference panel 2 2 










NZRM 908 


ESR 


la 


■a 


AF332894 


(NCDC SS615) 










NZRM 909 


ESR 


lb 


lb 


AF332904 


(NCDC SS618) 










NZRM 910 


ESR 


Ic 


la 


AF332914 


(NCDC SS700) 










NZRM 91 1 


ESR 


II 


II 


AF332906 


(NCDC SS619) 










NZRM 912 


ESR 


III 


III-3 3 


AF332897 


(NCDC SS620) 










NZRM 2217 


ESR 


Non-typable 


II 


AF332907 


(Prague 25/60) 




(R) 






NZRM 2832 


ESR 


IV 


IV 


AF332909 


(Prague 1/82) 










NZRM 2833 


ESR 


V 


V 


AF332911 


(Prague 10/84) 










NZRM 2834 


ESR 


VI 


VI 


AF332902 


(Prague 118754) 
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Notes. 

1. Reference panel 1: supplied by Dr Lawrence Paoletti, Channing Laboratory, 
Boston, USA. 

2. Reference panel 2: New Zealand Reference Medical Culture Collection strains 
supplied by Dr Diana Martin, ESR, Porirua, Wellington, New Zealand. 

3. MS III serosubtypes based on sequence heterogeneity; see text for more 
detail 
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Table 3. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 

Primer pairs* Specificity Length of amplicons (base 
— pairs) 



oago9/Sag190 


GBS (S. agalactiae) 


196 


ppno lf~* r~r> a 

OFBS/CFBA 


GBS (S. agalactiae) 


241 


16SO/23SA 


GBS (S. agalactiae) 


433 


DSF2/DSR1 


GBS (S. agalactiae) 


276 


cpsDS/cpsEA1 


serotypes la to VII 


449/458 


cpsES/cpsEA2 


serotypes la to VII 


424 


cpsES1/cpsEA3 


serotypes la to VII 


505 


cpsES2/cpsEFA 


serotypes la to VII 


515 


cpsES3/cpsFA 


serotypes la to VII 


450 


cpsFS/cpsGA1 


serotypes la to VII 


423 


cpsES3/cpsGA1 


serotypes la to VII 


790 


cpsGS/cpsIA 


serotypes la and III 


1672/1558 


cpsGS1/cpslA 


serotypes la and III 


1662/1548 


cpsGS/lacpsHA1 


serotype la 


1127 


cpsGS1/lacpsHA1 


serotype la 


1117 


lacpsHS/lacpsHA 


serotype la 


296 


lacpsHS/lacpsHAI 


serotype la 


574 


lacpsHS1/cpslA 


serotype la 


354 


cpsGS/lbcpsHA1 


serotype lb 


1468 


cpsGS1/lbcpsHA1 


serotype lb 


1458 


cpsGS/lbepsIA 


serotype lb 


1660 


cpsGSI/lbcpsIA 


serotype lb 


1650 


IbcpsHS/lbcpsHA 


serotype lb 


282 


lbcpsHS1/lbcpsHA1 


serotype lb 


349 


lbcpsHS2/lbcpslA 


serotype lb 


347 


IbcpslS/lbcpslAI 0 


serotype lb 


523 


cpsGS/lllcpsHA 


serotype III 


1063 


cpsGSI/lllcpsHA 


serotype III 


1053 


IIIVIcpsHS/lllcpsHA 


serotype III 


543 


IMcpsHS/cpslA c 


serotype III 


641 


cpsGS/IVcpsHA 


serotype IV 


1372 


cpsGSI/IVcpsHA 


serotype IV 


1362 


cpsGS/IVcpsMA 


serotype IV 


1686 
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cpsGSI/IVcpsMA 


serotype IV 


1676 


IVcpsHS/IVcpsHA 


serotype IV 


400 


IVcpsHS1/IVcpsMA c 


serotype IV 


379 


cpsGSA/cpsHAI 


serotype V 


1096 


cpsGSWcpsHAI 


serotype V 


1086 


cpsGSA/cpsMA 


serotype V 


1682 


CpsGSWcpsMA 


serotype V 


1672 


VcpsHSA/cpsHA 


serotype V 


349 


VcpsHS1/VcpsHA1 


serotype V 


401 


VcpsHS2A/cpsMA° 


serotype V 


374 


IIIVIcpsHSIA/lcpsHA 


serotype VI 


398 


cpsGSA/lcpsHAI 


serotype VI 


1205 


cpsGSWIcpsHAI 


serotype VI 


1195 


cpsGSA/lcpsIA 


serotype VI 


1527 


cpsGSWIcpsIA 


serotype VI 


1517 


VlcpsHS/VlcpsHA1 c - 


serotype VI 


327 


VIcpsHSWIcpsIA 


serotype VI 


360 



Notes. 

*See Table 2 for primer sequences and Figure 1 for some primer sites. 
Primers used in Algorithm for molecular serotype identification-Figure 2 
a. to identify GBS, b. for sequencing, c. for MS-specific PCR 
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Table 5. Comparison of the results of conventional serotyping (CS) and 
molecular serotype identification (MS)/subtyping of 206 clinical GBS 
isolates. 



MS/serosubtype 



CS 


la 


lb 


II 


III-1 1 


III-2 1 


III-3 1 


III-4 1 IV 


V 


VI 


VIII 


la 


38 




















lb 




30 


















II 






25 
















III 








27 


20 


4 


3 








IV 














7 








V 
















31 






VI 


















2 




VIII 




















1 


NT 1 


2 


5 


1 


3 


1 






5 


1 




Total (206) 2 


40 


35 


26 2 


30 


21 2 


4 


3 7 


36 


3 


1 



Notes. 

1 . For details of MS III serosubtypes see text. 

2. One mixed culture was included as two separate isolates (one serotype II, one 
subtype III-2). 
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Table 7. Specificity and expected lengths of amplicons of using different 
primer pairs. 



Primer pairs* Specificity Length of Protein profile 

amplicons code 



(base pairs) 


IgAagGBS/ 


bac 


532-838 


B 


RlgAagGBS 








lgAS1/lgAA1 


bac 


303-591 


B 


GBS1360S/ 


bac 


652 


B 


GBS1937A 








GBS1717S/ 


bac 


292 


B 


GBS1937A 








bcaS1/bcaA 


5'-end of bca 


390 


A 


bcaS2/bcaA 


5'-end of bca 


342 


A 


BcaRUS/bcaRUA 


bca repetitive unit/ 


235 


a/as 




bca repetitive unit-like 








region 






bcaS1/balA 


alp2/alp3 


446 


alp2 or alp3 


bcaS2/balA 


alp2/alp3 


398 


alp2 or alp3 


balS/balA 


alp2/alp3 


302 


alp2 or alp3 


bal23S1/bal2A1 


alp2 


334 


alp2 


bal23S2/bal2A1 


alp2 


253 


alp2 


bal23S1/bal2A2 


alp2 


426 


alp2 


bal23S2/bal2A2 


alp2 


345 


alp2 


bal23S1/bal3A 


alp3 


321 


alp3 


bal23S2/bal3A 


alp 3 


240 


alp3 


#ribS1/ribA3 


rib/rib-like 


355 


R/r 


ribS2/ribA1 


rib 


194 


R 


ribS2/ribA2 


rib 


225 


R 


ribS2/ribA3 


rib 


333 


R 



Notes. 

*See Table 6 for primer sequences. 

#For sequencing use only, not entirely specific for rib gene (see text for more 
detail). 
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Table 8. Genetic groups and subgroups of bac gene (C beta protein gene) 
based on amplicon length (using primers IgAagG BS/RlgAagG BS) and 
sequence heterogeneity. 



Group or 


N= 


Amplicon 


GenBank 


No. of different 


Molecular 


Subgroup 




length 


accession 


sites compared 


serotype/ 








numbers 


with (c.f.) main 


serosubtypes 










group 




B1 


19 


532 


X58470 




17 = lb; 2 = II 


B1a 


1 


532 


AF362686 


1 (c.f. B1 ) 


lb 


B2 


3 


550 


AF362687 




lb, II, I1I-4 


B3 


2 


586 


AF362688 




2=lb 


B3a 


1 


586 


AF362689 


4 (c.f. B3) 


V 


B3b 


1 


586 


AF362690 


21 (c.f. B3) 


VI 


B3c 


1 


586 


AF362691 


24 (c.f. B3) 


lb 


B4 


8 


604 


AF362692 




4 = lb; 4 = II 


B4a 


1 


604 


AF362693 


1 (c.f. B4) 


II 


B4b 


2 


604 


AF362694 


2 (c.f. B4) 


2 = lb 


B5 


2 


622 


AF362695 




la, VI 


B5a 


1 


622 


AF362696 


2 (c.f. B5) 


la 


B6 


1 


640 


AF362697 




lb 


B7 


1 


658 


AF362698 




lb 


B7a 


1 


658 


AF362699 


34 (c.f. B7) 


VI 


B8 


1 


712 


AF362700 




lb 


B9 


2 


748 


AF362701 




2= II 


B9a 


1 


748 


AF362702 


13 (c.f. B9) 


lb 


B10 


2 


820 


AF362703 




2 = lb 


B11 


1 


838 


AF362704 




lb 



Note. 

*See Table 9 for further details of serotype/serosubtype relationships with protein 
antigens. 
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Table 9. The relationship between GBS protein gene profiles and capsular 
polysaccharide (cps) molecular serotypes/serosubtypes. 



Serotype/ 


N= 


None 


Aa 


AaB 


R 


alp 


a 


as 


alp2as 


RB 


R 


serosubtype 

* 












3 










a. 


la 


43 


- ■ 


- 


2 


- 


- 


35 


3 


3 


- 


- 


lb 


37 


- 


1 


35 


- 


1 


- 


- 


- 


• - 


- 


II 


29 


- 


3 


10 


8 


2 


5 


- 


- 


- 


1 


111-1 


30 


- 


- 


- 


30 


- 


- 


- 


- 


- 


- 


III-2 


22 


- 


- 


- 


22 


- 


- 


- 


- 


- 


- 


III-3 


5 
















5 






III-4 


3 






1 




1 






1 






IV 


9 








1 




8 










V 


38 


1 






1 


35 








1 




VI 


5 




1 


3 






1 










VII 


1 










1 












VIII 


2 


1 








1 












Total 


224 


2 


5 


51 


62 


41 


49 


3 


9 


1 


1 



Note. 

*See text for explanation of cps serosubtypes and Table 7 for explanation of 
protein antigen gene profile codes. 
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Table 10. Oligonucleotide primers used in this study. 



Primer 


Target 


Tm°C 1 


GenBank 

accession 

numbers 


Sequence 2 


IS861S 


\S861 


77.4 


M22449 


445GAG AAA ACA AGA GGG 
AGA CCG AGT AAA ATG GGA 
CG479 


IS861A1 


\S861 


77.3 


M22449 


831 CAC GAT TTC GCA GTT 
CTA AAT AAA TCC GAC GAT 
AGC C795 


IS861A2 


\S861 


76.1 


M22449 


1020CAA ACT CCG TCA CAT 
CGG TAT AGC ACT TCT CAT 
AGG985 


IS1548S 


IS 1 548 


76.5 


Y14270 


143CTA TTG ATG ATT GCG 
CAG TTG AAT TGG ATA GTC 
GTC178 


IS1548S1 


\S1548 


77.0 


Y14270 


539GTT TGG GAC AGG TAG 
CGG TTG AGG AGA AAA GTA 
ATG574 


IS1548A1 


\S1548 


77.0 


Y14270 


574CAT TAC I I I TCT CCT 
CAA CCG CTA CCT GTC CCA 
AAC539 


IS1548A2 


\S1548 


70.3 


Y14270 


915CCC AAT ACC ACG TAA 
CTTATG CCA I I I G888 


IS1548A3 


\S1548 


78.0 


Y14270 


930CGT GTT ACG AGT CAT 
CCC AAT ACC ACG TAA CTT 
ATG CC893 


IS1381S1 


\S1381 


80.1 


AF064785/ 
AF367974 


272/81 8CTT ATG AAC AAA 
TTG CGG CTG ATT TTG GCA 
TTC ACG307/853 


IS1381S2 


\S1381 


81.7 


AF064785/ 
AF367974 


497/1 040GGC TCA GGC GAT 
TGT CAC AAG CCA AGG 
GAGS26/1069 
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IS1381A \S1381 73.1 



ISSa4S 



ISSa4 78.5 



ISSa4A1 ISSa4 75.2 



ISSa4A2 ISSa4 74.5 



GBSMS1 GBSil 78.6 



GBSi1S2 GBSM 77.3 



GBSMA1 GBSM 83 9 



GBSi1A2 GBSil 80.5 



AF064785/ 881/1424CTA AAA TCC TAG 

AF367974 770 ACG GTT GAT CAT TCC 
AGC849/1392 

AF1 65983 326CGT ATC TGT CAC TTA 
TTT CCC TGC GGG TGT CTC 
C359 

AF1 65983 639GCC GAT GTC ACA ACA 
TAG TTC AGG ATA TAG CCA 
G606 

AF1 65983 780CGT AAA GGA GTC CAA 
AGA TGA TAG CCT TTT TGA 
ACC745 

AJ292930 721 CAT CTC GGA ACA ATA 
TGC TCG AAG CTT ACA AGC 
AAG TG758 

AJ292930 789GGG GTC ACT ATC GAG 
CAG ATG GAT GAC TAT CTT 
CAC824 

AJ292930 1 058AAT GGC TGT TTC GCA 
GGA GCG ATT GGG TCT GAA 
CC1024 

AJ292930 1 1 61 CCA GGG ACA TCA ATC 
TGT CTT GCG GAA CAG TAT 
CG1127 



Notes. 

1 . The primer Tm values were provided by the primer synthesiser (Sigma-Aldrich). 

2. Numbers represent the numbered base positions at which primer sequences 
start and finish (numbering start point "1" refers to the start point "1" of 
corresponding gene GenBank accession number). 
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Table 11. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer pairs* 


Specificity 


Length of amplicons (base 
pairs) 


IS861S/IS861A1 


\S861 


387 


IS861S/IS861A2 


IS861 


576 


IS1548S/IS1548A1 


IS 1 548 


432 


IS1548S/IS1548A2 


IS 1548 


773 


IS1548S/IS1548A3 


\S1548 


788 


IS1548S1/IS1548A2 


\S1548 


377 


IS1548S1/IS1548A3 


IS 1548 


.392 


IS1381S1/IS1381A 


IS1381 


610/607# 


IS1381S2/IS1381A 


\S1381 


385 




ISSa4 


314 


ISSa4S/ISSa4A2 


ISSa4 


• 455 


GBSil S1/GBSMA1 


GBSil 


338 


GBSMS1/GBSMA2 


GBSil 


441 


GBSil S2/GBSMA1 


GBSM 


270 


GBSil S2/GBSMA2 


GBSil 


373 



Notes. 

*See table 10 for primer sequences. 

# Our sequencing result (Gen Bank accession number: AF367974) was 3 bp 
shorter than that previously described by Tamura et al., 2000 (GenBank accession 
number: AF064785). 



67 



PCT/AU02/01281 



Table 12. Relationship between mobile genetic elements and capsular 
polysaccharide serotypes, serotype III subtypes and surface protein gene 
profiles. 



oerotype/ 
s e ros u Dty p e 


Protein 
gene 
profile 


N= 


IS86? 


IS7548 


\S1381 


ISSa 
4 


GBSM 


No 

mobile 
element 


la 


AaB 


2 


2 




2 


- 


- 


- 


la 


alp2as 


3 


- 


- 


- 


- 


- 


3 


la 


a 


35 


3 


1 


35 


1 


- 


- 


la 


as 


3 


- 


- 


3 


- 


- 


- 


subtotal 




43 


5 


1 


40 




- 


3 


lb 


Aa 


1 


- 


- 


- 


- 


- 


1 


lb 


AaB 


35 


30 


- 


35 


1 


- 


- 


lb 


alp3 


1 


- 


- 


1 


- 


- 


- 


subtotal 




37 


30 


m 


36 


1 


- 




II 


Aa 


3 


3 


1 


3 


2 


1 


- 


II 


AaB 


10 


10 


5 


10 


5 


1 


- 


II 


alp3 


2 


1 


1 


2 


- 


- 


- 


II 


R 


8 


8 


- 


8 


- 


8 


- 


■ a 
II 


Ra 


1 


1 


- 


- 


- 


1 


• - 


II 


a 


5 


2 


2 


5 


3 


5 


- 


subtotal 




29 


25 


9 


28 


10 


16 


- 


IIM 


R 


30 


30 


30 


30 


1 


- 


- 


III A 

111-2 


R 


22 


22 


- 


- 


- 


22 




in i 
111-3 


alp2as 


5 


- 


- 


- 


- 


- 


5 


in ^ 
111-4 


AaB 


1 


1 


- 


1 


- 


1 


- 


III A 

111-4 


alp2as 


1 


- 


- 


- 


- 


1 


- 


II 1-4 


alp3 


1 


- 


- 


1 




1 


- 


subtotal 




60 


53 


30 


32 


1 


25 


5 


IV 


R 


1 


1 




1 




1 




IV 


a 


8 


2 




8 








subtotal 




9 


3 




9 


m 


1 




V 


alp3 


35 


3 


1 


35 


1 


1 




V 


R 


1 


1 




1 


1 






V 


RB 


1 


1 




1 








V 


none 


1 












1 


subtotal 




38 


5 


1 


37 


1 




2 
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VI 


Aa 


1 


- 


1 




AaB 


3 


3 


3 




a 


1 


- 


1 - - 


subtotal 




5 


3 


5 


I /II 

VII 


alp3 


1 


~ - 


1 - - 


1 f ■ ■ ■ 

VIII 


alp3 


1 




1 




none 


1 




1 


subtotal 




2 




2 


Total 




224 


124 41 (18) 


f90 75 f7) 43 (ffl>J 70 (4J 








(55) 


f85j 



Note. 

A: 5'-end of bca gene (C alpha protein); 

a: bca gene repetitive unit or bca gene repetitive unit-like sequence (multiple band 
amplicon); 

as: bca gene repetitive unit or bca gene repetitive unit-like sequence (single band 
amplicon); 

B: C beta/lgA binding protein {bad) gene. 
R: Rib protein (rib) gene; 
alp2: C alpha-like protein 2 (a/p2) gene; 
alp3: C alpha-like protein 3 (a/p3) gene; 
r: assumed Rib-like protein gene. 
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Table 13. Distribution of mobile genetic elements among 194 invasive 
GBS isolates. 



Mobile genetic elements present 



Total N = 


IS1381 


1S861 


IS1548 


lSSa4 


GBSil 


None 


6 












6 


78 


78 












2 










2 




37 


37 


37 










1 


1 




1 








3 


3 






3 






29 


29 


29 


29 








6 


6 


6 




6 






8 


8 


8 






8 




18 




18 






18 




1 


1 








1 




1 


1 




1 




1 




2 


2 


2 


2 




2 




2 


2 






2 


2 




Total 


168(87%) 


100 (52%) 


33 (17%) 


11(6%) 


34 (18%) 


6 (3%) 


(n=194) 















A/ofe. 

Data are numbers of isolates containing various combinations of mge 
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Table 14 Relationship between GBS genotypes and invasive disease age. 



Serotype Age-group/disease 1 
Genotype 





0-6d 


7-3m 


4m -14yr 


15-45 yr 


46-60 yr 


>60 yr 


Total 


Ia-1 


14 


4+1 


1 


7 


3 


6 


35+1 (19%) 


Ia-(2-8) 


4 


2 


_ 


1 


_ 


3 


10 


la total 


18 (34%) 


6+1 (21%) 


1 (10%) 


8 (28%) 


3 (18%) 


9 (17%) 


45+1 (24%) 


Ib-1 


2 


1+1 


- 


3 


2 


5+1 


13+2 


Ib-(2-16) 


3 


4+2 


- 


3 


1 


5 


16+2 


lb total 


5 (9.4%) 


5+3 (24%) 




6 (21%) 


3 


10+1 


29+4 (17%) 


n 


8 (15%) 


1 (3%) 




4+1 (17%) 


1 


4 (7%) 


18+1 (10%) 


IH-1 


6+1 (13%) 


4 (12%) 


1+1 (20%) 


1+1 (7%) 


6+1 (41%) 


4 


22+4 (13%) 


m-2 


5 (9%) 


5+4 (39%) 3 


1 (10%) 


2 


_ 


_ 


13+4 (9%) 


m-(3-4) 


1+1 


1 


- 


1 


1 


1 


5+1 


m total 


12+2 (26%) 


10+4 (41%) 


2+1 (30%) 


4+1 (17%) 


7+1 (44%) 


5(9%) 


40+9 (25%) 


IV total 


3 










4 


7 (4%) 


V-l 


3 


3 


2 


4 


2 


13+1 


27+1 (14%) 


V-(2-7) 


1 


1 




1 




4 


7 


V total 


4(8%) 


4 (12%) 


2 (20%) 


5 (17%) 


2 (11%) 


17+1 (33%) 4 


34+1 (18%) 


VI total 


1 








+1 


3 


4+1 (3%) 


TOTAL 


51+2=53 


26+8=34 


5+2=7 


27+1=29 


16+2=18 


52+2=54 


177+17=194 



Notes: 

1 . Numbers after "+" refer to CSF isolates; all others are from blood. 

2. Five aged 4m-1 yr and one case was aged 3 yr. 

3. Sst III-2 in late onset infection compared with all other groups: p=0.0005, odds 
ratio (OR) 6.8; 95% confidence interval (CI) 2.4-19.4. 

MS-V in elderly compared with all other age-groups: p=0.001, OR 0.28; 95% CI 
0.13-0.59). 
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CLAIMS 

1. A method of typing a group B streptococcal bacterium which method 
comprises analysing the nucleotide sequence of one or more regions within the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes of said bacterium, said region(s) 
comprising one or more nucleotides whose sequence varies between types. 

2. A method according to claim 1 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 
204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 
636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

3. A method according to claim 1 wherein at least one region is within a 
sequence delineated by the 3' 136 bases of the cpsE gene and the 5' 218 bases of 
the cpsG gene of the cpsE-cpsF-cspG gene cluster of said streptococcal 
bacterium. 

4. A method according to claim 3 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

5. A method according to any one of claims 1 to 4 wherein at least one region 
is within the cpsl/M genes of said bacterium. 

6. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises sequencing said one or more regions. 

7. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises determining whether a polynucleotide obtained 
from said bacterium selectively hybridises to a polynucleotide probe comprising 
one or more of the said regions. 

8. A method according to claim 7 which comprises determining whether the 
polynucleotide obtained from said bacterium hybridises to one or more of a 
plurality of polynucleotide probes corresponding to one or more of the said regions. 
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9. A method according to claim 9 wherein the plurality of polynucleotide probes 
are present as a microarray. 

10. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises an amplification step using one or more 
primers, at least one of which hybridises specifically to a sequence which differs 
between types. 

11. A method according to any one of claims 1 to 6 wherein the nucleotide 
sequence analysis step comprises an amplification step using primer pairs, at least 
one of which hybridise specifically to a sequence which differs between types. 

12. A method according to claim 10 or claim 11 wherein said primers are 
selected from the primers shown in Table 2. 

13. A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bacterium 
of one or more surface protein genes selected from rib, alp2 or a/p3 genes. 

14. A method according to claim 13 wherein determining the presence or 
absence of said surface protein genes comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said surface protein genes. 

15. A method according to any one of claim 13 wherein determining the 
presence or absence of said surface protein genes comprises an amplification step 
using one or more primers which amplify specifically a region of said surface 
protein genes. 

16. A method according to claim 15 wherein said primers are selected from the 
primers shown in Table 6. 

17. A method according to any one of claims 1 to 12 which further comprises 
determining the presence or absence of in the genome of said bacterium of one or 
more surface protein genes selected from rib, alp2 or alp3 genes. 
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18. A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bacterium 
of one or more mobile genetic elements selected from \S861, \S1548, \S1381, 
ISSa4 and GBSM. 

19. A method according to claim 18 wherein determining the presence or 
absence of said mobile genetic elements comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said mobile genetic elements. 

20. A method according to any one of claim 18 wherein determining the 
presence or absence of said mobile genetic elements comprises an amplification 
step using one or more primers which amplify specifically a region of said mobile 
genetic elements. 

21 A method according to claim 20 wherein said primers are selected from the 
primers shown in Table 10. 

22. A method according to any one of claims 13 to 17 which further comprises 
determining the presence or absence in the genome of said bacterium of one or 
more mobile genetic elements selected from \S861, \S1548, , \S1381, ISSa4 and 
GBSil. 

23. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a cpsD-cpsE-cpsF-cpsG gene of a group B 
streptococcal bacterium, said polynucleotide comprising one or more nucleotides 
which differ between group B streptococcal serotypes. 

24. A polynucleotide according to claim 23 wherein said nucleotides which differ 
between group B streptococcal serotypes correspond to one or more of positions 
62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 
457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 
1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 
1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as 
shown in Figure 1 . 

25. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a sequence delineated by the 3' 136 base pairs of 
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cpsE and the 5' 218 base pairs of cpsG of the cpsE-cpsF-cspG gene cluster of a 
group B streptococcal bacterium, said polynucleotide comprising one or more 
nucleotides which differ between group B streptococcal types. 

26. A polynucleotide according to claim 25 wherein said nucleotides which differ 
between group B streptococcal types correspond to one or more of positions 1413, 
1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 
1856, 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in 
Figure 1 . 

27. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a cpsl/M gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between streptococcal serotypes. 

28. A polynucleotide according to claim 27 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 2. 

29. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a rib, alp2 or alp3 gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between group B streptococcal subtypes. 

30. A polynucleotide according to claim 29 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 6. 

31 . Use of a polynucleotide according to any one of claims 23 to 30 in a method 
of serotyping and/or subtyping a group B streptococcal -bacterium. 

32. A composition comprising a plurality of polynucleotides according to any 
one of claims 23 to 30. 

33. Use of a composition according to claim 32 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 



34. A microarray comprising a plurality of polynucleotides according to any one 
of claims 23 to 30. 
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35. Use of a microarray according to claim 34 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 
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ABSTRACT 

MOLECULAR TYPING OF GROUP B STREPTOCOCCI 



Molecular methods are provided for typing group B streptococci, as well as 
polynucleotides useful in such methods. 
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Figure 1. Multiple sequence alignments of the regions of the 3' end of cpsD-cpsE-cpsF-and 
the 5' end of cpsG for reference strains of serotypes la to VDL 

1 50 
Serosubtype III-2 

Serotype . VI , 

Serotype lb , 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 '. . 

Serosubtype Ia-1 

Serosubtype III-l : 

Serotype IV 

Serotype V : — 

Serosubtype Ia-2 

Consensus GCAAAAGAAC AGATGGAACA AAGTGG TTCA AAGTTCTTAG GTATTATTCT 

cpsDS 

51 100 

Serosubtype III-2 

Serotype VI — 

Serotype lb ... 

Serotype II/III-4 -g 

Serotype VII -g : 

Serosubtype III-3 — 

Serosubtype Ia-1 -g 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 -g 

Consensus TAATAAAGTT AATGAATCTG TTGCTACTTA CGGCGATTAC GGCGAT TAT G 

101 150 

Serosubtype III-2 a- g 

Serotype VI 

Serotype lb 

Serotype II/III-4 < 

Serotype VII - : — 

Serosubtype III-3 r — — 

Serosubtype Ia-1 

Serosubtype III-l ■ 

.Serotype IV 

Serotype V r — — 

Serosubtype Ia-2 

Consensus GAAATTACGG AAAAAGGGAT AGAAAAAGGA AGTAAGGGGC TCTTGTATTG 

cpsD | 

151 200 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 



1 



PCT/AU02/01281 
Received 10 January 2003 



1/25 



Figure 1. Multiple sequence alignments of the regions of the 3' end of cpsD-cpsE-cpsF-and 
the 5' end of cpsG for reference strains of serotypes la to VII. 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
* Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 



50 



GCAAAAGAAC AGAT GGAACA AAGTGG TTCA AAGTT CTTAG GTATTATTCT 
cpsDS 

51 100 



-g- 
-g- 



TAATAAAGTT AATGAATCTG TTGCTACTTA CGGCGATTAC GGCGATTATG 



101 



150 



GAAATTACGG AAAAAGGGAT AGAAAAAGGA AGTAAGGGGC TCTTGTATTG 

cpsD | 

151 200 
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Serotype V M — 

Serosubtype Ia-2 

Consensus AAAGAAAAAG AAAATATACA AAAGATTATT ATAGCGATGA TTCAAACAGT 

I cpsE 

201 250 

Serosubtype III-2 a 

Serotype VI g 1 — c 

Serotype lb c _ 

Serotype II/III-4 - — 

Serotype VII 

Serosubtype III-3 a . — 

Serosubtype Ia-1 

Serosubtype III-l — 

Serotype IV c _ 

Serotype V _y- 

Serosubtype Ia-2 

Consensus TGTGGTTTAT TTTTCTGCAA GTTTGACATT AACATTAATT ACTCCCAATT 

251 300 

Serosubtype III-2 : 1 

Serotype VI 

Serotype lb 

Serotype II/III-4 - 

Serotype VII . 

Serosubtype III-3 — 

Serosubtype Ia-1 

Serosubtype III-l ■ — 

Serotype IV , 

Serotype V 

Serosubtype Ia-2 

Consensus TTAAAAGCAA TAAAGATTTA TTGTTTGTTC TATTGATACA TTATATTGTC 



301 350 

Serosubtype III-2 : . — 

Serotype VI 

Serotype lb 

Serotype II/III-4 — 

Serotype VII 

Serosubtype III-3 — 

Serosubtype Ia-1 . 

Serosubtype III-l t 

Serotype IV .- 

Serotype V 

Serosubtype Ia-2 

Consensus TTTTATCTTT CTGATTTTTA CAGAG ACTTT TGGAGTCGTG GCTATCTTG A 

cpsES 

351 400 

Serosubtype III-2 • 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serotype V M 

Serosubtype Ia-2 

Consensus AAAGAAAAAG AAAATATACA AAAGATTATT ATAGCGATGA TTCAAACAGT 

I cpsE 

201 250 

Serosubtype III-2 a 

Serotype VI g 1 — - c 

Serotype lb c _ 

Serotype II/III-4 

Serotype VII — _ 

Serosubtype 1 1 1-3 a 

Serosubtype" Ia-1 ■ 

Serosubtype III-l , 

Serotype IV c _ 

Serotype V y- 

Serosubtype Ia-2 - 

Consensus TGTGGTTTAT TTTTCTGCAA GTTTGACATT AACATTAATT ACTCCCAATT 

251 300 

Serosubtype III-2 t 

Serotype VI — _ 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 - . 

Serosubtype Ia-1 — 

Serosubtype III-l 

Serotype IV 1 

Serotype V - 

Serosubtype Ia-2 . 

Consensus TTAAAAGCAA TAAAGATTTA TTGTTTGTTC TATTGATACA TTATATTGTC 



301 350 

Serosubtype III-2 

Serotype VI -: 

Serotype lb ■ : = »_ : 

Serotype II/III-4 

Serotype VTI 

Serosubtype III-3 

Serosubtype Ia-1 — : : ; 

Serosubtype III-l t 

Serotype IV 

Serotype V : — . 

Serosubtype Ia-2 

Consensus TTTTATCTTT CTGATTTTTA CAGAG ACTTT TGGAGTCGTG GCTATCTTGA 

cpsES 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



351 400 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGAGTTTAAA AT GGTATTGA AATACAGCTT TTACTATATT TTCATATCAA 

401 450 

Serosubtype III-2 

Serotype VI 

Serotype lb c _ 

Serotype II/III-4 a _ 

Serotype VII ■ a _ t 

Serosubtype III-3 : . 

Serosubtype Ia-1 — : a _ 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 — : a _ 

Consensus GTTCATTATT TTTTATTTTT AAAAACTCTT TT ACAAC GAC ACGACTTTCC 

cpsEAl 

45 1 500 
Serosubtype III-2 

Serotype VI — : ; , ; 

Serotype lb 

Serotype II/III-4 c 

Serotype VII c ' 

Serosubtype III-3 g 

Serosubtype Ia-1 1 g 

Serosubtype III-l — 

Serotype IV a 

Serotype V i 

Serosubtype Ia-2 1 --- g 

Consensus TTTTTTAC TT TTATT GCT AT GAATT CGATT TTATTATATC TATTGAATTC 

501 550 

Serosubtype III-2 . . 

Serotype VI . ? 

Serotype lb 

Serotype II/III-4 

Serotype VII : 

Serosubtype III-3 . 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV : 

Serotype V 

Serosubtype Ia-2 . 

Consensus ATTTTTAAAA TATTAT CGAA AATATTCTTA CGCTAAGTTT TCACGAGATA 

551 600 

Serosubtype III-2 

Serotype VI 

Serotype lb '. . 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 . 

Serosubtype Ia-1 . 

Serosubtype III-l 

Serotype IV 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGAGTTTAAA ATGGTATTGA AATACAGCTT TTACTATATT TT CAT AT CAA 

401 450 

Serosubtype III-2 

Serotype VI . 

Serotype lb c- 

Serotype II/III-4 a- 

Serotype VII a- 1 

Serosubtype III-3 . 

Serosubtype Ia-1 — a- 

Serosubtype III-l 

Serotype IV 

Serotype V r -■ 

Serosubtype Ia-2 a- 

Consensus GTTCATTATT TTTTATTTTT AAAAACTCTT TT ACAACGAC ACGACTTTCC 

cpsEAl 

451 500 

Serosubtype III-2 

Serotype VI — — . 

Serotype lb 

Serotype II/III-4 c 

Serotype VII c — 

Serosubtype III-3 — g 

Serosubtype Ia-1 1 g 

Serosubtype III-l — 

Serotype IV a 

Serotype V 

Serosubtype Ia-2 1 g 

Consensus TTTTTTAC TT TTATT GCTAT GAATTCGATT TTATTATATC TATTGAATTC 

501 550 

Serosubtype III-2 

Serotype VT r— — — 

Serotype lb - : ■ • — — 

Serotype II/III-4 

Serotype VII — — 

Serosubtype III-3 

Serosubtype Ia-1 — 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus ATTTTTAAAA TATTAT CGAA AATATT CTTA CGCTAAGTTT T CACGAGAT A 

551 600 

Serosubtype III-2 

Serotype VI -. 

Serotype lb 

Serotype II/III-4 

Serotype VII — 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV ■ 
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Serotype V — 

Serosubtype Ia-2 ! — 

Consensus CCAAAGTTGT TTTGATAACG AATAAGGATT CTTTATCAAA AATGACCTTT 

601 650 

Serosubtype 1 1 1-2 

Serotype VI c 

Serotype lb — . 1 

Serotype II/ 1 1 1-4 -a — 

Serotype VII -a 

Serosubtype III-3 

Serosubtype Ia-1 — ■ 1 c 

Serosubtype III-l c . 

Serotype IV : 1 

Serotype V : . 1 

Serosubtype Ia-2 1 c 

Consensus AGGAATAAAT ACGACCATAA TTATATCGCT GTCTGTAT CT TGGACTCCTC 

651 700 

Serosubtype III- 2 

Serotype VI : : 

Serotype lb 

Serotype II/III-4 — 

Serotype VII 

Serosubtype III-3 - 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV • — 

Serotype V 

Serosubtype Ia-2 

Consensus TGAAAAGGAT TG TTATGATT TGAAACATAA CTCGTTAAGG ATAATAAACA 
cpsESl ' 

701 750 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 K 

Consensus AAGATGCTCT TACTTCAGAG TTA ACCTGCT TAACTGTTGA TCAAGCTTTT 

cpsEA2 

751 800 

Serosubtype III-2 — 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 . : 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV — . 

Serotype V 
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Serotype V 

Serosubtype Ia-2 — 

Consensus CCAAAGTTGT TTTGATAACG AATAAGGATT CTTTATCAAA AATGACCTTT 

601 650 

Serosubtype III-2 - : 

Serotype VI c 

Serotype lb ► 1 

Serotype II/III-4 -a 

Serotype VII -a 

Serosubtype III-3 — 

Serosubtype Ia-1 1 c — 

Serosubtype III-l c 

Serotype IV = — 1 

Serotype V 1 

Serosubtype Ia-2 -t c 

Consensus AGGAATAAAT ACGACCATAA TTATATCGCT GTCTGTAT CT TGGACTCCTC 

651 700 

Serosubtype III-2 : 

Serotype VI 

Serotype lb : — 

Serotype II/III-4 : , — 

Serotype VII ~ 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l ~ 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TGAAAAGGAT TG TTATGATT TGAAACATAA CTCGTTAAGG ATAATAAACA 
cpsESl 

701 > . 750 

Serosubtype III-2 

Serotype VI — 

Serotype lb 

Serotype II/III-4 — ' > — * 

Serotype VII — — 

Serosubtype III-3 — 

Serosubtype Ia-1 — — 

Serosubtype III-l • 

Serotype IV 

Serotype V 

Serosubtype Ia-2 K 

Consensus AAGATGCTCT TACTTCAGAG TTA ACCTGCT TAACTGTTGA TCAAGCTTTT 

cpsEA2 

751 800 

Serosubtype III-2 

Serotype VI : : 

Serotype lb . 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — . . . . i — — 

Serosubtype Ia-1 

Serosubtype III-l — 

Serotype IV 

Serotype V 
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Serosubtype Ia-2 — 

Consensus ATTAACATAC CCATTGAATT ATTTGGTAAA TACCAAATAC AAGATATTAT 

801 850 

Serosubtype III-2 

Serotype VI — t 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV • 

Serotype V : 

Serosubtype Ia-2 — 

Consensus TAAT GACATT GAAGCAATGG GAGTGATTGT CAATGTTAAT GTAGAGGCAC 



851 900 

Serosubtype III-2 — 

Serotype VI 

Serotype lb 

Serotype II/III-4 — 

Serotype VII , 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l — 

Serotype IV • — 

Serotype V 

Serosubtype Ia-2 - 

Consensus TTAGCTTTGA TAATATAGGA GAAAAGCGAA TCCAAACTTT TGAAGGATAT 

901 950 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 . 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 — ■ -- 

Serosubtype III-l 

Serotype IV — ' 

Serotype V 

Serosubtype Ia-2 

Consensus AGT GTTATT A CATATTCTAT GAAATTCTAT AAATATAGTC ACCTTATAGC 

951 1000 

Serosubtype III-2 

Serotype VI, t 

Serotype lb : t 

Serotype II/III-4 t 

Serotype VII :-- : t 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serosubtype Ia-2 • 

Consensus AT T AAC AT AC CCATTGAATT ATTTGGTAAA TACCAAATAC AAGATATTAT 

801 850 

Serosubtype III-2 — 

Serotype VI — t 

Serotype lb 

Serotype II/III-4 

Serotype VII — ■ 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus TAAT GACATT GAAGCAATGG GAGTGATTGT CAATGTTAAT GTAGAGGCAC 

851 900 

Serosubtype III-2 

Serotype VI ; 

Serotype lb r 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV : — 

Serotype V 

Serosubtype Ia-2 

Consensus TTAGCTTTGA TAATATAGGA GAAAAGCGAA TCCAAACTTT TGAAGGATAT 



901 950 

Serosubtype III-2 

Serotype VI 

Serotype lb — — 

Serotype II/III-4 » 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 — — ■ — : : : 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGTGTTATTA CATATTCTAT GAAATTCTAT AAATATAGTC ACCTTATAGC 



951 1000 

Serosubtype III-2 . — : 

Serotype VI t ■ 

Serotype lb t ■ — — — 

Serotype II/III-4 : — t < 

Serotype VII t : 

Serosubtype III-3 -: 

Serosubtype Ia-1 : 

Serosubtype III-l 
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Serotype IV — . 

Serotype V 

Serosubtype Ia-2 

Consensus AAAACGATTT TTGGATATCA CGGGTGCTAT TATAGGTTTG CT CAT ATGT G 

1001 1050 

Serosubtype III-2 

Serotype VI c 

Serotype lb _ 

Serotype II/III-4 

Serotype VII — 

Serosubtype III-3 a _ 

Serosubtype Ia-1 — a 

Serosubtype III-l 

Serotype IV — a 

Serotype V ■ a 

Serosubtype Ia-2 a 

Consensus GCATTGTGGC AATTTTTCTA GTTCCGCAAA TCAGAAA AGA TGGTGGACCG 

1051 1100 

Serosubtype III-2 

Serotype VI . 

Serotype lb 

Serotype II/III-4 : 

Serotype VII 

Serosubtype III-3 ■ . 

Serosubtype Ia-1 _ _ 

Serosubtype III-l 

Serotype IV 

Serotype V — 

Serosubtype Ia-2 

Consensus GCTATCTTTT CTCA AAATAG AGTAGGT CGT AATGGTAGGA TTTTTAGATT 
cpsES2 

HOI 1150 

Serosubtype III-2 -- 

Serotype VI — 

Serotype lb 

Serotype II/III-4 

Serotype VII . ■ _ 

Serosubtype III-3 . 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 : 

Consensus CTATAAATTC AGATCAAT GC GAGTAGATGC AGAACAAATT AAGA AAGATT 

cpsEA3 

1151 1200 

Serosubtype III-2 a 

Serotype VI a 

Serotype lb — g 

Serotype II/III-4 

Serotype VII — . 

Serosubtype III-3 a 

Serosubtype Ia-1 

Serosubtype III-l a 

Serotype IV a 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AAAACGATTT TTGGATATCA CGGGTGCTAT TATAGGTTTG CT CAT AT GT G 

1001 1050 

Serosubtype III-2 

Serotype VI c 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 . a 

Serosubtype Ia-1 a 

Serosubtype III-l 

Serotype IV a 

Serotype V a 

Serosubtype Ia-2 - : a 

Consensus GCATTGTGGC AATTTTTCTA GTTCCGCAAA TCAGAAA AGA TGGTGGACCG 

. 1100 

Serosubtype III-2 • 

Serotype VI . 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 ; . 

Serosubtype III-l 

Serotype IV 

Serotype V . 

Serosubtype Ia-2 

Consensus GCTATCTTTT CT CA AAAT AG AGTAGGT CGT AATGGTAGGA TTTTTAGATT 
cpsES2 

H°l . 1150 
Serosubtype III-2 . 

Serotype VI — : 

Serotype lb 

Serotype II/III-4 . 

Serotype VTI : 

Serosubtype III-3 . 

Serosubtype Ia-1 

Serosubtype III-l . 

Serotype IV 

Serotype V 

Serosubtype Ia-2 . 

Consensus CTATAAATTC AGAT CAAT GC GAGTAGATGC AGAACAAATT AAGA AAGATT 

cpsEA3 

HSl 1200 
Serosubtype III-2 a 

Serotype VI . : a 

Serotype lb — g 

Serotype II/III-4 

Serotype VII.. , : — . . , . — 

Serosubtype III-3 - a 

Serosubtype Ia-1^ 

Serosubtype III-l a 

Serotype IV . a 
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Serotype V 

Serosubtype Ia-2 ■ 

Consensus TATTAGTTCA CAATCAAATG ACAGGGCTAA TGTTTAAGTT AGACGATGAT 

1201 12 50 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII . 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CCTAGAATTA CTAAAATAGG AAAATTTATT CGAAAAACAA GCATAGATGA 

1251 1300 

Serosubtype III-2 

Serotype VI a 

Serotype lb 

Serotype II/III-4 ! 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 - 

Serosubtype III-l 

Serotype IV 

Serotype V g — 

Serosubtype Ia-2 

Consensus GTTGCCTCAA TTCTATAATG TTTTAAAAGG TGATATGAGT TTAGTAGGAA 

1301 1350 

Serosubtype III-2 

Serotype VI 4 — : 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 . 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CACGCCCTCC CACAGTT GAT GAAT AT GAAA AGTATAATTC AAC GCAGAAG 

1351 1400 

Serosubtype III-2 , — 

Serotype VI 

Serotype lb ■ 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l . - — : 
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Serotype V 

Serosubtype Ia-2 

Consensus TATTAGTTCA CAATCAAATG ACAGGGCTAA TGTTTAAGTT AGACGATGAT 

1201 1250 

Serosubtype III-2 : 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 ^ 

Serosubtype Ia-1 

Serosubtype III-l . 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CCTAGAATTA CTAAAATAGG AAAATTTATT CGAAAAACAA GCATAGATGA 

1251 1300 

Serosubtype III-2 , 

Serotype VI a 

Serotype lb 

Serotype II/III-4 ■ 

Serotype VII 

Serosubtype III-3 . _ 

Serosubtype Ia-1 

Serosubtype III-l . 

Serotype IV 

Serotype V — g — 

Serosubtype Ia-2 _ 

Consensus GTTGCCTCAA TTCTATAATG TTTTAAAAGG T GAT AT GAGT TTAGTAGGAA 

1301 1350 

Serosubtype III-2 

Serotype VI — : -- 

Serotype lb r: — _■ — ■_. :_ 

Serotype II/III-4 '. 

Serotype VII : - r 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV . 

Serotype V 

Serosubtype Ia-2 

Consensus CACGCCCTCC CACAGTT GAT GAATATGAAA AGTATAATTC AACGCAGAAG 

1351 1400 

Serosubtype III-2 - : 

Serotype VI : — '. 

Serotype lb — : 

Serotype Il/lH-4 - ~ , . 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l — - 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus CGACGCCTTA GTTTTAAGCC AGGAATCACT GGTTTGTGGC AAATATCTGG 

1401 1450 

Serosubtype III-2 

Serotype VI 

Serotype lb — --- — 

Serotype II/III-4 — — 

Serotype VII ■ . . 

Serosubtype III-3 — c 

Serosubtype Ia-1 — c 

Serosubtype III-l 

Serotype IV 

Serotype V j — 

Serosubtype Ia-2 — c : 

Consensus TAGAAATAAT ATTACTGATT TTGATGAAAT CGTAA AGTTA GATGTTCAAT 

1451 1500 

Serosubtype III-2 

Serotype VI a 

Serotype lb g 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — : 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 * 

Consensus ATATCAATGA ATGGTCTATT TGGTCAG ATA TTAAGATTAT TCTCCTAACA 

cpsES3 

1501 1550 

Serosubtype III-2 -t c — r 

Serotype VI -t c — 

Serotype lb -t c — — 

Serotype II/III-4 t : 

Serotype VII t 

Serosubtype III-3 1 

Serosubtype Ia-1 1 

Serosubtype III-l -t c — : — 

Serotype IV 1 

Serotype V -t c — 

Serosubtype Ia-2 1 

Consensus CTAAAGGTAG TCTTACTTGG GACAG G AGCT AAGTAAAGGT AAGGTTTGAA 

cpsE | cpsEFA 
1551 1600 

Serosubtype III-2 ■ 

Serotype VI — c ■- 

Serotype lb c 

Serotype II/III-4 : 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 — 

Serosubtype III-l 

Serotype IV — — 
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Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
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CGACGCCTTA GTTTTAAGCC AGGAAT CACT GGTTTGTGGC AAATATCTGG 
1401 1450 



TAGAAATAAT ATTACTGATT TT GAT GAAAT CGTAA AGTTA GATGTTCAAT 
1451 1500 



g 



ATATCAATGA AT GGT CTATT TGGT CAG ATA TTAAGATTAT TCTCCTAACA 
cpsES3 

1501 1550 

_ t — c — 

-t c — 

t 

t 

1 

1 — : 

1 c ; 

1 

_ t C : 

1 

CTAAAGGTAG TCTTACTTGG GACAGG AGCT AAGTAAAGGT AAGGTTTGAA 

cpsE | cpsEFA 
1551 1600 



c 
c 
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Serotype V 

Serosubtype Ia-2 : 

Consensus AGGAATATAA TGAAAATTTG TCTGGTTGGT TCAAGTGGTG GTCATCTAGC 

I cpsF 

1601 1650 

Serosubtype III-2 - 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII t 1 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 1 

Serotype V . 

Serosubtype Ia-2 

Consensus ACACTTGAAC CTTTTGAAAC CCATTTGGGA AAAAGAAGAT AGGTTTTGGG 

1651 1700 

Serosubtype III-2 . 

Serotype VI 

Serotype lb 1- 

Serotype II/III-4 

Serotype VII ._ 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V _ 

Serosubtype Ia-2 

Consensus TAACCTTTGA TAAAGAAGAT GCTAGGAGTA TTCTAAGAGA AGAGATTGTA 

1701 1750 

Serosubtype III-2 . 

Serotype VI '. 

Serotype lb 

Serotype II/III-4 . 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 . 

Serosubtype III-l . 

Serotype IV ■ 

Serotype V 

Serosubtype Ia-2 

Consensus TATCATTGCT TCTTTCCAAC AAACCGTAAT GTCAAAAACT TGGTAAAAAA 

1751 1800 

Serosubtype III-2 

Serotype VI 

Serotype lb — 

Serotype II/III-4 

Serotype VTI 

Serosubtype III-3 — 

Serosubtype Ia-1 
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Serotype V — 

Serosubtype Ia-2 

Consensus AGGAATATAA TGAAAATTTG TCTGGTTGGT TCAAGTGGTG GT CAT CTAGC 

I cpsF 

1601 1650 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII t 1 

Serosubtype III-3 • 

Serosubtype Ia-1 : — 

Serosubtype III-l — : , 

Serotype IV 1 

Serotype V _ 

Serosubtype Ia-2 : 

Consensus ACACTTGAAC CTTTTGAAAC CCATTTGGGA AAAAGAAGAT AGGTTTTGGG 



1651 1700 

Serosubtype III-2 

Serotype VI _ . 

Serotype lb 1 — 

Serotype II/III-4 — 

Serotype VII 

Serosubtype III-3 -. 

Serosubtype Ia-1 — — 

Serosubtype III-l 

Serotype IV 

Serotype V — -i 

Serosubtype Ia-2 

Consensus TAACCTTTGA TAAAGAAGAT GCTAGGAGTA TTCTAAGAGA AGAGATT GT A 

1701 1750 

Serosubtype III-2 — ■ — — 

Serotype VI -- ---- :•- — - 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV ■ 

Serotype V - : ■ 

Serosubtype Ia-2 : — 

' Consensus TATCATTGCT -TCTTTCCAAC AAACCGTAAT GTCAAAAACT TGGTAAAAAA 

1751 1800 

Serosubtype III-2 = 

Serotype VI 

Serotype lb > — .- 

Serotype II/III-4 '- 

Serotype VII 

Serosubtype III-3 •- 

Serosubtype Ia-1 
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Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



TACTATTCTA GCTTTTAAGG TCCTTAGAAA AGAAAGACCA GATGTTATCA 
1801 1850 



t 



TAT CAT CT GG TGCCGCTGTA GCAGTACCAT T CTTTTATAT T GGTAAGTTA 
cpsFS 

1851 1900 



a 



. c g 

TTTGGTTGTA AGACCGTTTA TATAGAGGTT TTCGACA GGA T AG AT AAAC C 
cpsFA 

1901 1950 



AACTTTGACA GGAAAATTAG T GT AT C CT GT AACAGATAAA TTTATTGTTC 
1951 2000 



a 
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Serosubtype III-l — 

Serotype IV '. 

Serotype V 

Serosubtype Ia-2 : 

Consensus TACTATTCTA GCTTTTAAGG TCCTTAGAAA AGAAAGACCA GAT GT TAT CA 

1801 1850 

Serosubtype III-2 — 

Serotype VI - 

Serotype lb : 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 : : 

Serosubtype III-l — — 

Serotype IV -t — 

Serotype V 

Serosubtype Ia-2 . 

Consensus TAT CATCTGG TGCCGCTGTA GCAGTAC CAT T CTTTTATAT TGGTAAGTTA 

cpsFS 

1851 1900 

Serosubtype III-2 ■- 

Serotype VI 

Serotype lb c 

Serotype II/III-4 

Serotype VII a- — 

Serosubtype III-3 - 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV — 

Serotype V — c -g 

Serosubtype Ia-2 , _. 

Consensus TTT GGTTGTA AGACCGTTTA TATAGAGGTT TTCGACA GGA TAGATAAACC 

cpsFA 

1901 1950 

Serosubtype III-2 

Serotype VI - --- — — . — , 

Serotype lb 

Serotype II/III-4 _ 

Serotype VII 

Serosubtype III-3 . 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 — : . 

Consensus AACTTTGACA GGAAAATTAG TGTATCCTGT AACAGATAAA TTTATTGTTC 

1951 2000 

Serosubtype III-2 . 

Serotype VI a 

Serotype lb 

Serotype II/III-4 -. : • 

Serotype VII — : 

Serosubtype 1 1 1-3 - : — ■. 

Serosubtype Ia-1 — — 

Serosubtype III-l 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus AGTGGGAAGA AATGAAAAAA GTTTAT CCTA AGGCAATTAA TTTAGGAGGA 

2001 2050 

Serosubtype III-2 

Serotype VI 

Serotype lb a ■- 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V — : 

Serosubtype Ia-2 

Consensus ATTTTTTAAT GATTTTTGTC ACAGT GGGGA CACATGAACA GCAGTT CAAC 
cpsF | cpsG 



2051 



2100 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype. II/III-4 
Serotype VTI 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



CGTCTTATTA AAGAAGTTGA TAGATTAAAA GGGACAGGTG CT ATT GAT CA 



2101 



2150 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



AGAAGTGTTC ATTCAAACGG GTTACT CAGA CTTTGAACCT CAGAATTGTC 



2151 



cpsGS 
2200 



-g- 
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Serotype IV : 

Serotype V . 

Serosubtype Ia-2 : 

Consensus AGT GGGAAGA AATGAAAAAA GTTTATCCTA AGGCAATTAA TTTAGGAGGA 

2001 2050 

Serosubtype III-2 

Serotype VI 

Serotype lb a : 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — 

Serosubtype Ia-1 

Serosubtype III-l : . 

Serotype IV _ 

Serotype V 

Serosubtype Ia-2 - : : 

Consensus ATTTTTTAAT GATTTTTGTC ACAGT GGGGA CACATGAACA GCAGTT CAAC 
cpsF | cpsG 

2051 2100 

Serosubtype III-2 . 

Serotype VI 

Serotype lb 

Serotype II/III-4 — 

Serotype VTI 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 1 

Serotype IV a — 

Serotype V 

Serosubtype Ia-2 

Consensus CGTCTTATTA AAGAAGTTGA TAGATTAAAA GGGACAGGTG CTATTGATCA 

2101 2150 

Serosubtype III-2 c 

Serotype VI — — _ — 

Serotype lb 

Serotype II/III-4 r - — 

Serotype VTI 

Serosubtype III-3 ■ 

Serosubtype Ia-1 

Serosubtype III-l c 

Serotype IV ■ — 

Serotype V 

Serosubtype Ia-2 

Consensus , AGAAGT GTT C ATTCAAACGG GT T ACT CAGA CTTT GAACCT CAGAATTGTC 

cpsGS 

2151 " 2200 

Serosubtype III-2 : 

Serotype VI 

Serotype lb 

Serotype II/III-4 . — 

Serotype VII ■- g g 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 
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Serotype IV - 

Serotype V 

Serosubtype Ia-2 — 

Consensus AGTGGTCA AA ATTT CTCTCA TAT GAT GAT A TGAACTCTTA CAT GAAAGAA 

cpsGAl cpsGA2 

2201 2226 

Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 •— 

Serotype VII — 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l — ■ 

Serotype IV 

Serotype V — . 

Serosubtype Ia-2 

Consensus GCTGAGATTG TTAT CACACA TGGCGG 
cpsGA3 

Notes. 

Numbering start point "1" refers to the start point "1" of GenBank accession number AF332908 (for 
serotype TV reference strain 3 139). 

Serosubtype Ia-1 : strain 090, GenBank accession number AF332893; 

Serosubtype Ia-2: strain NZRM 908(NCDC SS615), GenBank accession number AF332894; 

Serotype lb: strain H36B, GenBank accession number AF332903; 

Serotype n/HI-4: strain 18RS21, GenBank accession number AF332905; 

Serosubtype HI- 1: strain SG99/056, GenBank accession number AF332899; 

Serosubtype III-2: strain M781, GenBank accession number AF3 32896; 

Serosubtype HI-3: strain NZRM 912 (NCDC SS620), GenBank accession number AF332897; 

IH-4 (Subtype IH-4): strain SG96/220, GenBank accession number AF363036; 

Serotype IV: strain 3139, GenBank accession number AF3 32908; 

Serotype V: strain CJB 1 1 1, GenBank accession number AF332910; 

Serotype VI: strain SS1214, GenBank accession number AF332901; 

Serotype VII: strain 7271, GenBank accession number AF332913. 
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Serotype IV _ 

Serotype V 

Serosubtype Ia-2 

Consensus AGTGGTCAAA ATTT CTCTCA TAT GAT GAT A TGAACTCTTA CAT GAAAG AA 

cpsGAl cpsGA2 

2201 2226 

Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 — * 

Consensus GCTGAGATTG TTAT CACACA TGGCGG 
cpsGA3 

Notes. 

Numbering start point "1" refers to the start point "1" of GenBank accession number AF332908 (for 
serotype IV reference strain 3139). 

Serosubtype Ia-1 : strain 090, GenBank accession number AF332893; 

Serosubtype Ia-2: strain NZRM 908(NCDC SS615), GenBank accession number AF332894; 

Serotype lb: strain H36B, GenBank accession number AF332903; 

Serotype n/IE-4: strain 18RS21, GenBank accession number AF332905; 

Serosubtype m-1: strain SG99/056, GenBank accession number AF332899; 

Serosubtype m-2: strain M781, GenBank accession number AF332896; 

Serosubtype HI-3 : strain NZRM 912 (NCDC SS620), GenBank accession number AF332897; 

ffl-4 (Subtype IH-4): strain SG96/220, GenBank accession number AF363036; 

Serotype IV: strain 3139, GenBank accession number AF332908; 

Serotype V: strain CJB 111, GenBank accession number AF332910; 

Serotype VI: strain SS1214, GenBank accession number AF332901; 

Serotype VH: strain 7271, GenBank accession number AF332913. 
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Figure 3. Multiple sequence alignments of the gene sequences of the cpsG-cpsH- 
cpsI/M for serotypes la, lb, BE, m, IV, V and VI (start and stop codons were 
highlighted). 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 



TCACAGTGGG 
TCACAGT GGG 
TCACAGTGGG 
TCACAGTAGG 
TCACAGTGGG 
TCACAGTGGG 



GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 



CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 



cpsG 

51 

TAAAGAAGTT 
TAAAGAAGTT 
TAAAGAAGTT 
TAAAGAAGTT 
TAAAGAAGTT 
TAAAGAAGTT 



GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 
GATAGATTAA 



AAGGGACAGA 
AAGGGACAGG 
AAGGGACAGG 
AAGGGACAGG 
AAGGGACAGG 
AAGGGACAGG 



50 

ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
★★*★+*★+★★ 

100 

CAAGAAGT GT 
CAAGAAGTGT 
CAAGAAGT GT 
CAAGAAGTGT 
CAAGAAGTGT 
CAAGAAGTGT 



T GCTATT GAT 
TGCTATTGAT 
T GCTATT GAT 
TGCTATTGAT 
TGCTATTGAT 
TGCTATTGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



101 

TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 



GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 



GACTTT GAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTCGAAC 
GACTTTGAAC 



CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 



150 

TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



151 

AAATTTCTCT 
AAATTTCTCT 
AAATTTCTCT 
AAATTTCTCT 
AAATTTCTCT 
AAATTTCTCT 



CAT AT GAT GA 
CATATGATGA 
CAT AT GAT G A 
CATATGATGA 
CATATGATGA 
CATATGATGA 



TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 
TATGAACTCT 



T AC AT GAAAG 
TACATGAAAG 
T ACAT GAAAG 
TACATGAAAG 
TACATGAAAG 
TACATGAAAG 



200 

AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 
AAGCT GAGAT 



*•*■★*★**•*• + + + + + + + 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



201 

TGTTATCACA 
TGTT AT CACA 
TGTTATCACA 
TGTTATCACA 
TGTTATCACA 
TGTTATCACA 



CATGGCGGTC 
CATGGCGGCC 
CATGGCGGTC 
CACGGCGGTC 
CATGGCGGCC 
CATGGCGGCC 



CAGCGACGTT 
CAGCGAC GTT 
CAGCGACGTT 
CAGCAACGTT 
CAGCGACGTT 
CAGCGACGTT 



TATGAATGCA 
TAT GAATGCA 
TATGAATGCA 
TATGAATGCA 
TATGTCAGTT 
TATGT CAGTT 



250 

GTTT CTAAAG 
GTTTCTAAAG 
GTTT CTAAAG 
GTTTCTAAAG 
ATTTCTTTAG 
ATTTCTTTAG 



★★★★ ★ *★ 



1 
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Figure 3. Multiple sequence alignments of the gene sequences of the cpsG-cpsH- 
cpsI/M for serotypes la, lb, H, m, IV, V and VI (start and stop codons were 
highlighted). 

1 50 
Serotype IV ATGATTTTTG TCACAGTGGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Serotype V ATGATTTTTG TCACAGTGGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Serotype la ATGATTTTTG TCACAGTGGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Serotype lb ATGATTTTTG TCACAGTAGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Serotype III ATGATTTTTG TCACAGTGGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Serotype VI ATGATTTTTG TCACAGTGGG GACACATGAA CAGCAGTTCA ACCGTCTTAT 
Consensus ********** *******_** ********** ********** ********** 
cpsG 

51 100 
Serotype IV TAAAGAAGTT GAT AG AT T AA AAGGGACAGA TGCTATTGAT CAAGAAGT GT 
Serotype V TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 
Serotype la TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 
Serotype lb TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 
Serotype III TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 
Serotype VI TAAAGAAGTT GATAGATTAA AAGGGACAGG TGCTATTGAT CAAGAAGTGT 
Consensus ********** ********** *********_ ********** ********** 

101 150 

Serotype IV TCATTCAAAC GGGTTACTCA GACTTTGAAC CTCAGAATTG TCAGTGGTCA 

Serotype V TCATTCAAAC GGGTTACTCA GACTTTGAAC CTCAGAATTG TCAGTGGTCA 

Serotype la TCATTCAAAC GGGTTACTCA GACTTTGAAC CTCAGAATTG TCAGTGGTCA 

Serotype lb TCATTCAAAC GGGTTACTCA GACTTTGAAC CTCAGAATTG TCAGTGGTCA 

Serotype III TCATTCAAAC GGGTTACTCA GACTTCGAAC CTCAGAATTG TCAGTGGTCA 

Serotype VI TCATTCAAAC GGGTTACTCA GACTTTGAAC CTCAGAATTG TCAGTGGTCA 

Consensus ********** ********** *****_**** ********** ********** 

151 200 

Serotype IV AAATTTCTCT CAT AT GATGA TATGAACTCT TACATGAAAG AAGCT GAGAT 

Serotype V AAATTTCTCT CATAT GATGA TATGAACTCT TACATGAAAG AAGCTGAGAT 

Serotype la AAATTTCTCT CATAT GATGA TATGAACTCT TACATGAAAG AAGCTGAGAT 

Serotype lb AAATTTCTCT CATAT GAT GA TATGAACTCT TACATGAAAG AAGCTGAGAT 

Serotype III AAATTTCTCT CATAT GATGA TATGAACTCT TACATGAAAG AAGCTGAGAT 

Serotype VI AAATTTCTCT CATAT GATGA TATGAACTCT TACATGAAAG AAGCTGAGAT 

Consensus ********** ********** ********** ********** ********** 

201 250 

Serotype IV TGTTAT CACA CATGGCGGTC CAGCGACGTT TAT GAATGCA GTTTCTAAAG 

Serotype V TGTTAT CACA CATGGCGGCC CAGCGACGTT TATGAATGCA GTTTCTAAAG 

Serotype la TGTTATCACA CATGGCGGTC CAGCGACGTT TATGAATGCA GTTTCTAAAG 

Serotype lb TGTTATCACA CACGGCGGTC CAGCAACGTT TATGAATGCA GTTTCTAAAG 

Serotype III TGTTATCACA CATGGCGGCC CAGCGACGTT TAT GT CAGT.T ATTTCTTTAG 

Serotype VI TGTTATCACA CATGGCGGCC CAGCGACGTT TAT GT CAGTT ATTTCTTTAG 

Consensus ********** ★*_★****_* ****^***** ★★*★ * _★*★** ** 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



251 

GGAAAAAAAC 
GAAAAAAAAC 
GGAAAAAAAC 
GGAAAAAAAC 
GGAAATTACC 
GGAAATTACC 



TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
AGTTGTTGTT 
AGTTGTTGTT 



CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGGAGAA 
CCCAGGAGAA 



AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AGCAGTTTGG 
AGCAGTTTGG 



300 

AGAGCATGTG 
AGAGCATGT G 
AGAGCATGTG 
AGAGCATGTG 
T GAAC AT AT C 
T GAAC AT AT C 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



301 

AATAATCATC 
AATAATCATC 
AATAATCATC 
AATAATCATC 
AAT GAT CAT C 
AAT GAT CAT C 



AGGTGGATTT 
AGGTGGACTT 
AGGTGGATTT 
AGGTGGATTT 
AAATACAATT 
AAATACAATT 



TGTTAATAAG 
TGTTAATAAG 
TTTGAAAGAG 
TTTGAAAGAG 
TTTAAAAAAA 
TTTAAATTCG 



GTAAAAACAA 
GTAAAAACAA 
TTATTCTTGA 
TTATTCTTGA 
ATTGCCCACC 
ATTGCCCACC 



350 

TGTATAATTT 

TGTATAATTT 

AAATTGAATT 

AATATGAGTT 

TGTATCCCTT 

TGTATCCCTT 
_ ★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



351 

TGATATCGTT 
T GAT AT CGTT 
AGATTATATT 
AGATTATATT 
GGCTTGGATT 
GGCTTGGATT 



GTAGATATTG 
GTAGATATTG 
TTGAATATCA 
TTGAATATCA 
GAAGAT GT AG 
GAAGAT GT AG 



AAAGGTTACA 
AAAGGTTACA 
GTGAATTAGA 
GTGAATTAGA 
ATGGACTTGC 
ATGGACTTGC 



AAATGTAGTC 
AAATGTAGTC 
GAATATTATT 
GAATATTATT 
GGAAGCGTT . 
GGAAGCGTT . 



400 

TAT GAGGGGA 
TAT GAGGGAA 
AAGGAAAAAA 
AAGGAAAAAA 
. . GAAAAGGA 
• . GAAAAGGA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



401 450 

CGATGAATCG TCCGTTTTTA GAAACTAACA GAAGTAATTT TATT 

TGATGAATCG TCCGTTTTTA GAAACTAATA GT AGTAATT T TATT 

ATATAT CTAC TAGTAAAGTA AT AT C ACAAA ACAAT GATTT TTGTTTCTCT 

AT AT AT CT AC TAGTAAAGTA AT AT CACAAA ACAAT GATTT TTGTTCCTCT 

ATATAGCTAC AGAAAAATAT CAGGGAAATA AT GAT ATGTT TTGT ...... 

ATATAGCTAC AGAAAAATAT CAGGGAAATA AT GAT ATGTT TTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



451 500 

GAAGAA TTTAAGGTAA TATTAAAGGA 

GAAGAA TTTAAGGTAA TATTAAAGGA 

TTCAAAAATG AACATTTCAT AAACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC . . TTTCT AAACTATTTG AATAAATATA TTTTGTTGGA 

• CATA AATTAGAAAA AATTATAGGT 

CATA AATTAGAAAA AATTATAGGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



501 550 

GTTGTGTGAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GTTGTGCGAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GAAAAAAATT GAAATTAACA TATCAATCCA AAGTATTTGT TAATAGGAGG 
GAAAAAAATT GAAATTAAC. TATCAATCCA AAGTATTTGT TAATAGGAGG 
GAAATATGAG GAAAT . . . .A TCTAGATTTA GATTATTCTT TATTTTATGC 
GAAATATGAG GAAAT. . . .A TCTAGATTTA GATTATTCTT TATTTTATGC 
* **** — ★ — ** — ★ + ★ ★ 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
. Consensus 



251 

GGAAAAAAAC 
GAAAAAAAAC 
GGAAAAAAAC 
GGAAAAAAAC 
GGAAAT T AC C 
GGAAATT AC C 



TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
AGTTGTTGTT 
AGTTGTTGTT 



CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGGAGAA 
CCCAGGAGAA 



AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AGCAGTTTGG 
AGCAGTTTGG 



300 

AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
T G AAC AT AT C 
T GAAC AT AT C 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



301 

AAT AAT CAT C 
AAT AAT CAT C 
AAT AAT CAT C 
AAT AAT CAT C 
AAT GAT CAT C 
AAT GAT CATC 



AGGTGGATTT 
AGGTGGACTT 
AGGTGGATTT 
AGGTGGATTT 
AAATACAATT 
AAATACAATT 



TGTTAATAAG 

TGTTAATAAG 

TTTGAAAGAG 

TTT GAAAGAG 

TTTAAAAAAA 

TTTAAATTCG 




GTAAAAACAA 
GTAAAAACAA 
TTATTCTTGA 
TTATTCTTGA 
ATTGCCCACC 
ATTGCCCACC 





350 

TGTATAATTT 
TGTATAATTT 
AAATT GAATT 
AATATGAGTT 
TGTATCCCTT 
TGTATCCCTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



351 

TGATAT CGTT 
TGATATCGTT 
AGATTATATT 
AGATTATATT 
GGCTTGGATT 
GGCTTGGATT 



GTAGATATTG 
GTAGATATTG 
TTGAATATCA 
TT GAATATCA 
GAAGAT GT AG 
GAAGAT GT AG 



AAAGGTTACA 
AAAGGTTACA 
GT GAATT AGA 
GT GAATTAGA 
ATGGACTTGC 
ATGGACTTGC 



AAAT GTAGTC 
AAATGTAGTC 
GAATATTATT 
GAATATTATT 
GGAAGCGTT . 
GGAAGCGTT . 



400 

TAT GAGGGGA 
TAT GAGGGAA 
AAGGAAAAAA 
AAGGAAAAAA 
. . GAAAAGGA 
. .GAAAAGGA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



401 450 

CGATGAATCG TCCGTTTTTA GAAACTAACA GAAGTAATTT TATT 

TGATGAATCG TCCGTTTTTA GAAACTAATA GTAGTAATTT TATT 

AT AT AT CTAC TAGTAAAGTA ATATCACAAA ACAATGATTT TTGTTTCTCT 

AT AT AT CTAC TAGTAAAGTA ATATCACAAA ACAATGATTT TTGTTCCTCT 

ATATAGCTAC AGAAAAATAT CAGGGAAATA ATGATATGTT TTGT 

ATATAGCTAC AGAAAAATAT CAGGGAAATA ATGATATGTT TTGT. ..... 

— ** ★ — * ** * ★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



451 500 

GAAGAA TTTAAGGTAA TATTAAAGGA 

• GAAGAA TTTAAGGTAA TATTAAAGGA 

TTCAAAAATG AACATTTCAT AAACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC. . TTTCT AAACTATTTG AATAAATATA TTTTGTTGGA 
CATA AATTAGAAAA AATTATAGGT 

CATA AATTAGAAAA AATTATAGGT 

— ★ ** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



501 550 

GTT GT GT GAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GTTGTGCGAT GAAA ATCAATAAA AACTCTTTAT TTTATATTGC 

GAAAAAAATT GAAATTAACA TATCAATCCA AAGTATTT GT TAATAGGAGG 

GAAAAAAATT GAAATTAAC. TATCAATCCA AAGTATTTGT TAATAGGAGG 

GAAATATGAG GAAAT .... A TCTAGATTTA GATTATT CTT TATTTTATGC 

GAAATATGAG GAAAT.... A TCTAGATTTA GATTATTCTT TATTTTATGC 

* ★★★★ ★ *★ + ★ ★ *_ 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



551 

AATATTTTTA 
AATATTTTTA 
AATTTTCGCT 
AATTTTCGCT 
TCTTTGGGTA 
TCTTTGGGTA 



GTTAATTTTT 
GTTAATTTTT 
TTAACCCTAT 
TTAACCCTAT 
CTTATTTTAG 
CTTATTTTAG 



TTAAATCACT 
TTAAATCACT 
TTTCAAAGCC 
TTTCAAAGCC 
TACCAAACCA 
TACCAAACCA 



AGGTTTAGGA 
GGGTTTAGGC 
AATGCAACTT 
AATGCAACTT 
AT GGT AT CAG 
AT GGT AT CAG 



600 

GAGGGGAACT 
GAGGGAAACT 
TTGTTACTTT 
TTGTTACTTT 
TTTTTAATTA 
TTTTTAATTA 



601 

CAACTTACAA 
CAGCTTACAA 
TAGCATTAAT 
TAGCATTAAT 
TTACCATTAT 
TTACCATTAT 



AATAGTGATG 
AATAGTGATG 
AGTTTTACTT 
AGTTTTACTT 
AGTTCTATTA 
AGTTCTATTA 



TTTGTTGCAA 
TTAGTTGCAA 
ATTT GTAGT A 
ATTT GTAGTA 
TTACTTTGGA 
TTACTTTGGA 



cpsH 

TCTTCTTGTG 
TTTTACTGTG 
GTTATAAGAA 
GTTATAATGA 
AGAGT GAGTT 
AGAGT GAGTT 



* *- *-* — * — *_ _* * + 



650 

TGGAATAAAA 
TGGAATAAAA 
AAAAATGAAA 
AAAAATGAAA 
TAGAAT. . .A 

TAGAAT. . .A 

★ ■*• 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



651 

TTTTTA. . . . 
TTTTTA. . . . 
TTTTTATATA 
TTTTTAAATA 
TCTATAAGCA 

TCTATAAGCA 





. . TTAGATAG 
. . TTAGATAG 
TGGCTGAAAT 
TGGCTGAAAT 
ATTCTTCAAT 
ATTCTTCAAT 



CCTTTATTTT 
CCTTTATTTT 
TTTTTTCATT 
TTTTTTCATT 
ACTATTTCTG 
ACTATTTCTG 



GAAAGAAGAA 
GAAAGAAGAA 
GTATTTTATA 
GTATTTTATA 
CTTTGGTTAT 
CTTTGGTTAT 



700 

AACTCGTTAT 
AACTCGTGAT 
TCATTTATTT 
TGGTTTATTT 
TTATTTATTT 
TTATTTATTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



701 

CATCTTTTTA 
CATCTTTTTA 
AACTTCAATA 
AGT AT CAATA 
ATTTGCAATA 
ATTT GCAATA 



TTATTTATTG 
TTATTTATCG 
TTGCTACATT 
GTATTAAATT 
CTCATTAGAG 
CT CAT TAGAG 



CGACCATTTT 
CGACCATTTT 
CTTTGTTTAA 
CGTTATTTAG 
GT ACT CAAGA 
GT ACT CAAGA 



GAATTT ATT C 
GAATTTATTC 
AACTCCTGAT 
AAGT CCAGAA 
GGATATAACG 
GGATAT AACG 



750 

TTTGTTCATA 
TTTGTTCATA 
TTTGATAGAA 
TTTCATAGAG 
TTTCAGCGAT 
TTTCAGCGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



751 

AGGTTACTTT 
AGGTTACTTT 
TTTTAGCAGC 
TCATTGCTGC 
TTATTGCTGA 
TTATTGCTGA 



TATATTAA C 

TATATTAA C 

TTTTAACTCG TTGATTATCG GT AT AGTAT C 
ATTCAATTCA CTGGCAGTAG GGGTTGTGTC 
GCTATTAAAA CTAATTAGTA CAGGATATGC 
GCTAT TAAAA CTAATTAGTA CAGGATATGC 



800 

TTTAATTTTT 
TTTAATTTTT 
AGTGGCTTTG 
CTTATTATTT 
TTTATTTTTT 
TTTATTTTTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



801 

TTTCTAGCAT 
TTTCTAGCAT 
AAACGGTGGT 
TACCATTACT 
TATAATTATT 
TATAATTATT 



TAAAGGATAT 
TAAAGGACAT 
ATAAGAATAC 
ATAAGAATAC 
ATAGAAAAGC 
ATAGAAAAGC 



CTCTCTAAAA 
CT CTCTAAAA 
AACTTTGGAG 
TAATATTGAA 
TGATTTTAAT 
TGATTTTAAT 



AAAGCTTT CT 
AAAGCTTT CT 
TTAGATAAAA 
TTAACAAAAT 
AGTT CAGTT G 
AGTTCAGTTG 



850 

CTATAATAAT 
CTATAATAAT 
TATTAAAAGC 
TGCTAAAATC 
T AAGGAAT GT 
TAAGGAATGT 



3 
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551 600 
AATATTTTTA GTTAATTTTT TTAAATCACT AGGTTTAGGA GAGGGGAACT 
AATATTTTTA GTTAATTTTT TTAAATCACT GGGTTTAGGC GAGGGAAACT 
AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 
AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 
TCTTTGGGTA CTTATTTTAG TACCAAACCA AT GGTAT CAG TTTTTAATTA 
TCTTTGGGTA CTTATTTTAG TACCAAACCA AT GGTAT CAG TTTTTAATTA 
— ★ * ★ + _ 

cpsH 

601 " 650 

CAACTTACAA AATAGTGATG TTTGTTGCAA TCTTCTTGTG TGGAATAAAA 
CAGCTTACAA AATAGTGATG TTAGTT GCAA TTTTACTGTG TGGAATAAAA 
TAGCATTAAT AGTTTTACTT ATTT GTAGTA GTTATAAGAA AAAAATGAAA 
TAGCATTAAT AGTTTTACTT ATTT GTAGTA GTTATAATGA AAAAATGAAA 
TTACCATTAT AGTT CTATTA TTACTTTGGA AGAGTGAGTT TAGAAT. . .A 
TTACCATTAT AGTT CTATTA TTACTTTGGA AGAGTGAGTT TAGAAT. - .A 

* * — *-* * + _ * * **★ * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
. Consensus 



651 

TTTTTA . . . . 

TTTTTA . . . . 

TTTTTATATA 

TTTTTAAATA 

TCTATAAGCA 

TCTATAAGCA 




. - TTAGATAG 
. . TTAGATAG 
TGGCTGAAAT 
TGGCTGAAAT 
ATTCTTCAAT 
ATTCTTCAAT 



CCTTTATTTT 

CCTTTATTTT 

TTTTTTCATT 

TTTTTTCATT 

ACTATTTCTG 

ACT ATTT CT G 
*_ 



GAAAGAAGAA 
GAAAGAAGAA 
GTATTTTATA 
GTATTTTATA 
CTTTGGTTAT 
CTTTGGTTAT 



700 

AACTCGTTAT 
AACTCGTGAT 
TCATTTATTT 
TGGTTTATTT 
TTATTTATTT 
TTATTTATTT 



800 

TTTAATTTTT 
TTTAATTTTT 
AGTGGCTTTG 
CTTATTATTT 
TTTATTTTTT 
TTTATTTTTT 



701 750 
CATCTTTTTA TTATTTATTG CGACCATTTT GAATTT ATT C TTTGTTCATA 
CATCTTTTTA TTATTTATCG CGACCATTTT GAATTTATTC TTTGTTCATA 
AACTTCAATA TT GCTACATT CTTTGTTTAA AACTCCTGAT TTTGATAGAA 
AGTATCAATA GTATTAAATT CGTTATTTAG AAGTCCAGAA TTTCATAGAG 
ATTTGCAATA CT CATTAGAG GTACT CAAGA GGATATAACG TTTCAGCGAT 
ATTTGCAATA CTCATTAGAG GTACTCAAGA GGATATAACG TTTCAGCGAT 



751 

AGGTTACTTT TATATTAA . C 

AGGTTACTTT TATATTAA C 



TTTTAGCAGC TTTTAACTCG TTGATTATCG GTATAGTATC 

TCATTGCTGC ATTCAATTCA CT GGCAGT AG GGGTTGTGTC 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 
★ — ★ ★ * 

801 850 
TTTCTAGCAT TAAAGGATAT CTC-T CTAAAA AAAGCTTTCT CTATAATAAT 
TTTCTAGCAT TAAAGGACAT CT CT CTAAAA AAAGCTTTCT CTATAATAAT 
AAACGGTGGT ATAAGAATAC AACTTTGGAG TTAGATAAAA TATTAAAAGC 
TACCATTACT ATAAGAATAC TAATATTGAA TTAACAAAAT TGCTAAAATC 
TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAATGT 
TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAATGT 
* — * ★ _ * * * 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



851 

AGGATCGCGT 
AGGATCGCGT 
ATTTTTATTT 
ATTTTTGTTT 
GGTAAAGGTT 
GGTAAAGGTT 



ATTTTGGGAG 
ATTTTGGGAG 
AATGGGTTAA 
AATGCAATTA 
AACTATTTTG 
AACTATTTTG 



TTCTATTAAA 
TTCTATTAAA 
TCCTATTTTT 
TTTTGTTTTG 
TGTTGTTTCT 
TGTTGTTTCT 



TCAAATTTTT 
TCAAATTTTT 
TTTAGGGGGA 
TTTAGGATTT 
TATAACAGTT 
TATAACAGTT 



* — ★ ★ + 



900 

GT GAAATTAG 
GTGAAATTAG 
ACATATTATT 
CTATATTATT 
TTATATT. . . 
TTATATT . . . 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



901 

ATTTAATAGA 
ATTTAATAGA 
ATTGTTTGCA 
ATGC CATATA 

TATT 

TATT 



AATTAAATAT 
AATTAAGTAT 
TAATAATATT 
TTTTGATGTA 
TTTTCCTATG 
TTTT CCAAAT 



ATCAATTTTT 
GTCAATTTTT 
CAAAATATCA 
GAGAATGTAA 
CTGAAGCCAA 
GAATTTACTA 



ATAGGGATGG 
AT AGGGAT GG 
GTATTTTTGG 
GTCTTTTTGG 
CTTTATTTGG 
CATTCCTAGG 
** 



950 

ACAATTTATT 
ACAATTTATT 
TAGAGATTTG 
AAGAAATTTA 
AAGAGAAT T G 
AAGAGATTTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



951 1000 

CTGAGAAGTG ACT TAGG 

CTGAGAAGTG ACT TAGG 

ATTGGGTCAG ACT GGATTAA TGGTATGCAT ACTCAAAGAG CAATGGGATT 
ATT GGATCAG ATT GGATAAA TGGGATGCAT ACGCAGAGAG CAATGGCTTT 
TTTTCAATAG AGTGGTTTCC ACATATG. . . AGAATAAGAC TTGCGGCATA 
TTTTCAATTG AATGGATTCC TTCTATG. . . AAAGTTAGAC TT ACT GCAT A 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1001 

TTTT GGT CAT 

TTTT GGT CAT 

TTTT GAATAT 

CTTT GAATAT 

TTTT GAATAT 

TTTT GAGTAT 
•* 



CCTAACTTTA 
CCTAACTTTA 
TCAAACCTTA 
TCAAATCTTA 
GCTACACTAA 
GCAACACTAT 



TT CATAATTT 
TTCATAATTT 
TAATT CCT AT 
TAATACCCTT 
TTGGTCAGTT 
TAGGT CAGTT 



TTTTGCAGTA 
TTTTGCTCTA 
GACAGT GGTA 
AACTATCATA 
TATTTTATTT 
TATTTTATTC 



1050 
ACT GTT TTTT 
ACTATTTTCT 
ACTAACTATA 
ACTAA . TATA 
TCTTATCCCA 
ACTTAT CCGA 



1051 1100 

Serotype IV TAT AT GTAAC ACTTTTTTAT AGAAAACTAA GAT.TAATAA CTATTGCTTT 

Serotype V TGTATATTGT ACTCAATTAT AAACGACTAA AGC.CTGTTG TGATGGTTTT 

Serotype la TATATATATA T.TATATGAA GT T AAGAAAC TATTCAATTA TGACCATAGG 

Serotype lb TATATATATA TATATATTAA GCAAAGATAT AGCT CAGGGA T GATGAT ACT 

Serotype III TAC. TTTTTTTGAA ACCCCAAAAA CATAT GGAAA ATATTT TAAT 

Serotype VI TAT TATTTTTAAA ACAGCAGAGG TAT GGAGAAA ATATTTTTAT 

Consensus * — * ', . * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1101 

TATTTTAACT 
ATTTTTAACA 
TGTTGTATTA 
CGGTGCTCTT 
ATCCTTACTG 
CACACTATTC 



CTAAATTACT 
TTAAATTATT 
TTATTTACCT 
CTCTCCACTA 
TTGACTATAT 
CTAGTTTTTT 



TCTTGTATCA 
TATTGTACCA 
TTATTTTACC 
TTATACTACC 
GTTCATACTT 
GT GCATATTT 



GTATACTTAT 
ATATACTTTT 
TATTGGATCG 
CATCGGGTCT 
TTCTGGCGCT 
GACAGGGGCA 



1150 
TCAAGAACTG 
TCAAGGACAG 
GGCTCCAGGG 
GGATCTAGAG 
AGAATACTAT 
AGAATTTTCC 



4 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



851 

AGGATCGCGT 
AGGATCGCGT 
ATTTTTATTT 
ATTTTTGTTT 
GGTAAAGGTT 
GGTAAAGGTT 



ATTTTGGGAG 
ATTTTGGGAG 
AATGGGTTAA 
AATGCAATTA 
AACTATTTTG 
AACTATTTTG 



TTCTATTAAA 
TTCTATTAAA 
TCCTATTTTT 
TTTTGTTTTG 
TGTTGTTTCT 
TGTTGTTTCT 



TCAAATTTTT 
TCAAATTTTT 
TTTAGGGGGA 
TTTAGGATTT 
TATAACAGTT 
TATAACAGTT 



900 

GTGAAATTAG 
GTGAAATTAG 
ACATATTATT 
CTATATTATT 
TTATATT . . . 
TTATATT. . . 



+ •*• ★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



901 

ATTTAATAGA 
ATTTAATAGA 
ATTGTTTGCA 
AT GCCAT AT A 

TATT 

TATT 



AATTAAATAT 
AATTAAGTAT 
TAATAATATT 
TTTTGATGTA 
TTTTCCTATG 
TTTTCCAAAT 



ATCAATTTTT 
GTCAATTTTT 
CAAAATATCA 
GAGAATGTAA 
CTGAAGCCAA 
GAATTTACTA 



ATAGGGATGG 
ATAGGGATGG 
GTATTTTTGG 
GTCTTTTTGG 
CTTTATTTGG 
CATTCCTAGG 



950 

ACAATTTATT 
ACAATTTATT 
TAGAGATTTG 
AAGAAATTTA 
AAGAGAATTG 
AAGAGATTTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



951 1000 

CTGAGAAGTG ACT ...... TAGG 

CTGAGAAGTG ACT TAGG 

ATTGGGT CAG ACT GGATT AA TGGTATGCAT ACTCAAAGAG CAATGGGATT 
ATTGGATCAG ATT GGAT AAA TGGGATGCAT ACGCAGAGAG CAATGGCTTT 
TTTT CAATAG AGTGGTTTCC ACATATG. . . AGAATAAGAC TTGCGGCATA 
TTTTCAATTG AATGGATTCC TTCTATG. . . AAAGTTAGAC TTACTGCATA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



1001 

TTTTGGTCAT 

TTTTGGTCAT 

TTTT GAATAT 

CTTTGAATAT 

TTTT GAATAT 

TTTTGAGTAT 
*★ 



CCTAACTTTA 
CCTAACTTTA 
TCAAACCTTA 
TCAAATCTTA 
GCTACACTAA 
GCAACACTAT 



TTCATAATTT 
TTCATAATTT 
TAATTCCTAT 
TAATACCCTT 
TTGGTCAGTT 
TAGGT CAGTT 



TTTTGCAGTA 
TTTTGCTCTA 
GACAGT GGTA 
AACTATCATA 
TATTTTATTT 
TATTTTATTC 



1050 
ACTGTTTTTT 
ACT ATTTT CT 
ACTAACTATA 
ACTAA . TATA 
TCTTATCCCA 
ACTTATCCGA 



+ ★ * * ★ _ 



1051 1100 

Serotype IV TATATGTAAC ACTTTTTTAT AGAAAACTAA GAT.TAATAA CTATTGCTTT 

Serotype V TGTATATTGT ACTCAATTAT AAACGACTAA AGC • CTGTTG TGATGGTTTT 

Serotype la TATATATATA T.TATATGAA GTTAAGAAAC TATTCAATTA TGACCATAGG 

Serotype lb TATATATATA TATATATTAA GCAAAGATAT AGCTCAGGGA T GAT GAT ACT 

Serotype III TAC TTTTTTTGAA ACCCCAAAAA CATATGGAAA ATATTTTAAT 

Serotype VI TAT TATTTTTAAA ACAGCAGAGG TAT GGAGAAA ATATTTTTAT 

Consensus * — * — * 



Serotype- IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1101 

TATTTTAACT 
ATTTTTAACA 
TGTTGTATTA 
CGGTGCTCTT 
ATCCTTACTG 
CACACT ATT C 



CTAAATTACT 
TTAAATTATT 
TTATTTACCT 
CTCTCCACTA 
TT GACTAT AT 
CTAGTTTTTT 



TCTTGTATCA 
TATTGTACCA 
TTATTTTACC 
TTATACTACC 
GTTCATACTT 
GTGCATATTT 



GTATACTTAT 
ATATACTTTT 
TATT GGAT CG 
CATCGGGTCT 
TTCTGGCGCT 
GACAGGGGCA 



1150 
TCAAGAACTG 
TCAAGGACAG 
GGCTCCAGGG 
GGAT CTAGAG 
AGAATACTAT 
AGAATTTTCC 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1151 

GAT AT TAT AT 
GGTATTATAT 
CTGGAATAGT 
CTGGTATTAT 
TGGTCTGTAT 
TAATTTGTAT 



AGTACT CTTA 
CGTAATTTTA 
AGCTATATTG 
AGTTGTGCTA 
GTTGGTTTTA 
GATAATTTTA 



TTT AT ACT T A 
TTTATTGTAC 
GCGCAGATGT 
CTACAGGTTA 
TTAGCAT CGC 
TTAGGTTATT 



TTATATATGT 
TCATTTATGT 
TTATTCTTCT 
TAATTTTATT 
TTCTTTTAGA 
TACTCTTAGA 



1200 
TACAAAGAAT 
GACAAAGAAT 
TCTAAATACA 
GTT GAATACA 
T TAT AT CCTT 
AAT AAT CAT T 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1201 

AACCTGATAA 
AGCTTAATAA 
GTTGTCGTAA 
ATTGTAATAA 
TTTAAAACTA 
AATAAATTTA 



GGAAAATTTT 
AAAGAGTATT 
AGAAGAAAAC 
AAAGACAAAC 
ATTTGAAATT 
ACCTAAAAAT 



TATGATAGTT 
TATGAAATTA 
TATAAAATTT 
GATAAGATTT 
GACCAAGAAA 
TACTAAAAAA 



GCTCCGTACA 
GCACCCTATG 
TTATTGTACA 
TTCCTGTATT 
AACACTTTTA 
GCTGTCTTTT 



1250 
TACAACTGTT 
TACAATTTTT 
TACTTCCGTT 
TAGTTCCGAT 
TACT T GGTAT 
TGATAATTAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1251 

CTTGTTAGCA 
TTTATTAGTA 
TCTACTAGTA 
ACTAATATTA 
GACTTTCTTA 
AGGGATAATA 



TTTACTTTTC 
TTTACCTTTT 
AT AGTAAT GA 
CTATTAGTGA 
TTTATCACCG 
TTATTATTGG 



TTTGCTCTAC 
TGAGTTCTAC 
TGTTATATTT 
TATTACGTTT 
CTTGTTTTTC 
TATGTTTTTC 



TATTTTTTTC 
AATTTTTTTT 
TGATAACTTA 
TGATAATTTG 
TTATAACATA 
TTACAAAGTG 



1300 
AACTCAAATT 
AATTCAAATT 
CTAT CTATAT 
GTGAGCATAT 
TGGTCAATAA 
GAGTCTATTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1301 

TTGTTCAAAA 
TTGTTCAAAA 
ATT AT C GTAT 
ATAATAGAAT 
TTGAAAAAAT 
TCAAT TAT AT 



ATTAGATAGC 
ATTAGATGTT 
AATTAATTTG 
AATCAATTTG 
AATTAT GTAC 
AATACACTAT 



CTTTTGACAG 
CTT TTAACAG 
CGATCCGGGA 
CGGTCGGGAA 
AGAAACCAAA 
AGATTT CAAA 



1350 

GTAG 

GTAG 

GTAGTGAATC CAGATTTTCT 
GTAGTGAATC TAGATTTTCT 
GTAC TAT CAC TAGGATGATA 
GTAGTAGTAC AAGATT GACA 



1351 1400 

Serotype IV . . GTTAAACT ATGCTCATTT ACAGCTTGTA GACGGCTTAA CTCTTTTTGG 

Serotype V . .ATTACACT ATGCTCATTT ACAACTT GTA GATGGTTTAA CTCCTTTTGG 

Serotype la GTATATAAAG ATACAGTAAA CAT CGTTATA AATAATT CTT TATTATTTGG 

Serotype lb TTGTACAAGG ATACCGTACA CTCAGTAATT ACTGACTCAC TATTTCTGGG 

Serotype III GTTT AT CAAG AAAGTATTAT TGAAGTTCTA AAAGGAAATA TTTTATTTGG 

Serotype VI GTCTATTACG AAAGTATAAG AGCGATTTTA GAT GGGAATT T CCTT ATT GG 

Consensus * * — * ★ — _ . — 

1401 1450 

Serotype IV AAATAGTTTT AAGGAG A CGAGTGTCCT 

Serotype V AAATAGTTTT AAGGAA A CAAGTGTCCT 

Serotype la AGAAG GAGTT AAAGAGTTAT GGTTAAATAG T GAT CTACCT TTGGGGTCGC 

Serotype lb AAAAGGTGTA AAAGAATTGT GGTTAAATAG TGATTTACCA CTAGGATCGC 

Serotype III ACAGGGTATA AGGA, . .TTC CAT CAAGT GA AGGAAT ATT C CTAGGATCGC 

Serotype VI GCAAGGTATA AGAG. . .TTC CCTCCAGTGT GGGAATATTT TTAGGTTCAC 

Consensus — * — * — *- * * — *★ 



5 



PCT/AU02/01281 
Received 10 January 2003 



18/25 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1151 

GATATTATAT 
GGTATTATAT 
CTGGAATAGT 
CTGGTATTAT 
TGGTCTGTAT 
TAATTT GTAT 



AGTACTCTTA 
CGTAATTTTA 
AGCTAT ATT G 
AGTTGTGCTA 
GTTGGTTTTA 
GATAATTTTA 



TTTATACTTA 
TTTATTGTAC 
GCGCAGAT GT 
CTACAGGTTA 
TTAGCATCGC 
TTAGGTTATT 



TTATATATGT 
T CATTTAT GT 
TTATTCTTCT 
TAATTTTATT 
TTCTTTTAGA 
TACTCTTAGA 



1200 
TACAAAGAAT 
GACAAAGAAT 
TCTAAATACA 
GTT GAATACA 
TTATATCCTT 
AATAAT CATT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1201 

AACCTGATAA 
AGCTTAATAA 
GTTGTCGTAA 
ATT GTAATAA 
TTTAAAACTA 
AATAAATTTA 



GGAAAATTTT 
AAAGAGTATT 
AGAAGAAAAC 
AAAGACAAAC 
ATTTGAAATT 
ACCTAAAAAT 



TATGATAGTT 
TATGAAATTA 
TATAAAATTT 
GATAAGATTT 
GACCAAGAAA 
TACTAAAAAA 



GCTCCGTACA 
GCACCCTATG 
TTATT GTACA 
TTCCTGTATT 
AACACTTTTA 
GCTGTCTTTT 



1250 
TACAACTGTT 
TACAATTTTT 
TACTTCCGTT 
TAGTTCCGAT 
TACTTGGTAT 
T GAT AATT AT 



1251 1300 

Serotype IV CTTGTTAGCA TTTACTTTTC TTTGCTCTAC TATTTTTTTC AACTCAAATT 

Serotype V TTTATTAGTA TTTACCTTTT TGAGTTCTAC AATTTTTTTT AATT CAAATT 

Serotype la TCTACTAGTA ATAGTAATGA TGTTATATTT TGATAACTTA CTAT CTATAT 

Serotype lb ACTAATATTA CTATTAGTGA TATTACGTTT TGATAATTTG GT GAGCATAT 

Serotype III GACTTTCTTA TTTATCACCG CTTGTTTTTC T T AT AACAT A TGGTCAATAA 

Serotype VI AGGGATAATA TTATTATTGG TATGTTTTTC TTACAAAGTG GAGTCTATTA 

Consensus * * -* * — *- ★ 

1301 1350 

Serotype IV TTGTTCAAAA ATTAGATAGC CTTTTGACAG GTAG 

Serotype V TTGTTCAAAA ATTAGATGTT CTTTTAACAG GTAG 

Serotype la ATTATCGTAT AATTAATTTG CGATCCGGGA GTAGTGAATC CAGATTTTCT 

Serotype lb ATAATAGAAT AATCAATTTG CGGTCGGGAA GTAGTGAATC TAGATTTTCT 

Serotype III TTGAAAAAAT AATTATGTAC AGAAACCAAA GTACT AT CAC TAGGATGATA 

Serotype VI TCAATTATAT AATACACTAT AGATTTCAAA GTAGTAGTAC AAGATTGACA 

Consensus *- *-* r *** 



1351 1400 

Serotype IV . . GTTAAACT ATGCTCATTT ACAGCTT GTA GACGGCTTAA CTCTTTTTGG 

Serotype V . .ATTACACT ATGCTCATTT ACAACTT GTA GATGGTTTAA CTCCTTTTGG 

Serotype la GTATATAAAG ATACAGTAAA CAT CGTTATA AATAATT CTT TATTATTTGG 

Serotype lb TTGTACAAGG ATACCGTACA CTCAGTAATT ACTGACTCAC TATTTCTGGG 

Serotype III GTTTATCAAG AAAGTATTAT TGAAGTTCTA AAAGGAAATA TTTTATTTGG 

Serotype VI GT CTATTACG AAAGTATAAG AGCGATTTTA GATGGGAATT TCCTTATTGG 

Consensus * * — * * — *- 

1401 1450 

Serotype IV AAATAGTTTT AAGGAG A CGAGTGTCCT 

Serotype V AAATAGTTTT AAGGAA A CAAGTGTCCT 

Serotype la AGAAGGAGTT AAAGAGTTAT GGTTAAATAG T GAT CT AC CT TTGGGGTCGC 

Serotype lb AAAAGGTGTA AAAGAATT GT GGTTAAATAG TGATTTACCA CTAGGATCGC 

Serotype III ACAGGGTATA AGGA. , . TTC CAT CAAGT GA AGGAAT ATT C CTAGGATCGC 

Serotype VI GCAAGGTATA AGAG . . . TTC CCTCCAGTGT GGGAATATTT TTAGGTTCAC 

Consensus — * — * — *- +-- r - — * . — * — — 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1451 

ATTTGATAAT 
ATTT GATAAT 
ATTCAACGTA 
ATTCGACCTA 
ATTCTACTTA 
ATTCATCATA 



AGCT ACT CT A 
AGCTACT CTA 
TATAGGCTAT 
CAT AGGT TAT 
TATTAGTGTC 
CATTAGTATA 



TGTTATTGAG 
TGTTATTGAG 
TTCTACAAAA 
TTCTATAAAA 
TTTTACAGGA 
TTTTATAGAA 



TATGTATGGT 
TATGTATGGT 
GTGGCCTGCT 
CTGGCCTATT 
CTTCTTTATT 
CTTCTTTTAC 



1500 
GTAGTACTTA 
GTAGTACTTA 
GGGATTAATG 
TGGACTAATA. 
AGGAATTGTT 
GGGGCTGTTT 



1501 1550 

Serotype IV CCATGTTTTG TAT GATAAT C TATTATATCT ATAGTAAAAA AGTCAATGTA 

Serotype V CCATGTTTTG TAT GATAAT C TATTATATCT ATAGTAAAAA GATAATCATA 

Serotype la AATATAGTTC CAGGTTTGCT TTTAAT.TTT TACTAATATT GGTAGGAAAG 

Serotype lb AATGTGATTT TAGGTTTGTT TCTAAT.TCT TATTAGCATT AT CAAGGAAG 

Serotype III CTTTATTTTT CTGCCTTTAT ACTTTTATAT AAAGAAGCGA TTTCAAAAAA 

Serotype VI CTTTTCTTTT CAATATTACT TTTT CTATAT AGAGAAGCTA TCAAACAAAA 

Consensus **- * — + — _ 

1551 1600 

Serotype IV GTTGAGCTCC AGATACTTTT GTTTA TA 

Serotype V ATTGAACTTC AACTACT CCT ATTTA T A 

Serotype la CTAAACAATC AGCTTTTTAT TATGAGATAG TAGGAACACT TATAACTTTA 

Serotype lb CTAAAAAGTC AGATTTCTAT TATGAGATAG TAGGGTCTGT CATACTCCTA 

Serotype III TTATAAAATC TACAGATTAT TTT . T TTATACGTTA 

Serotype VI CAGGATAATC TACAAGCTTT TTT T TGGATTGTTA 

Consensus *- * * — ★ *★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1601 

AT GT CTAT AG 
ATGTCTATAA 
TTCTCATTTT 
TTTTCATTTT 
TTATGTTACA 
TTATTGTATA 



TATTATTTAC 
TATTATTTAC 
TTGCACTTGA 
TTGCACTTGA 
CGCTCTTTGA 
TGGTATTTGA 



AGAGAGTTTT 
TGAAAGTTTT 
AGATCTT GAC 
AGATATTGAT 
GGAAATAGAT 
AGAATTTGAT 



TAC CCAAGT A 
TATCCCAGTG 
GGAGCTAATT 
GGCGCCAATT 
CCTAATCATT 
CCTAAT CATT 



1650 
TAGTTATGAA 
T GGT AAT GAA 
GGCTTATTGT 
GGCT CATTAT 
GGAGTATTGT 
GGAGTGTTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1651 1700 
TATTAGTTGG ATGGTTTTTG GGAAAATATT TTGTGGGGGT GTAGATGATT 
TATTAGTTGG CTAGTTTTTG GTAAAATATT TTGTGATGGT AT CGAAC CTA 
TTTTATTTTT ACAGT GTTAG GAATTTTAGA AAATAAGGAT TTTT ATAGT C 
TTTTGTCTTT ACAGT GTTGG GAATTTTAGA AAATAAGGAT TTCTATAGTC 

ATTAT T ATT C TCAACTTTTG GTATAGT GGG AAGGGCTAA 

ATTGTTATTT ACTACATTAG GTATAGT AG G GAGAG.GGA 

— * * — * 



1701 1750 

Serotype IV TACAAC GAGAGTT CACTTGGACG GCAAATAAAA ATTAGTGTAA 

Serotype V TAAAAA AGGAATT TACT... ATT GT GAATAAT A TATGACATAT 

Serotype la AACT TAAAAG GT GGAAAAGT TAATGGAAAA AC GAAT ACTT GTTTCTATCA 

Serotype lb AACT TAAAAG GTGGGAAAGT TAATGGAAAA ACAAATACTT GTTTCTATCG 

Serotype III AAAAT GAAAGAAAAA. GTAACAGTCA 

Serotype VI ATGAT AAAAAAACTA GTTAGTGTGA 

Consensus * — + — : * — 



6 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1451 

ATTTGATAAT 
ATTTGATAAT 
ATT CAACGTA 
ATTCGACCTA 
ATTCTACTTA 
ATTCATCATA 



AGCTACTCTA 
AGCTACTCTA 
TATAGGCTAT 
CATAGGTTAT 
TATTAGTGTC 
CATTAGTATA 



TGTTATTGAG 
TGTTATT GAG 
TTCTACAAAA 
TTCTATAAAA 
TTTTACAGGA 
TTT TATAGAA 



TATGTATGGT 
TATGTATGGT 
GTGGCCTGCT 
CTGGCCTATT 
CTTCTTTATT 
CTTCTTTTAC 



1500 
GTAGT ACT T A 
GTAGTACTTA 
GGGATTAATG 
TGGACTAATA 
AGGAATTGTT 
GGGGCTGTTT 



1501 1550 

Serotype IV CCATGTTTTG TAT G AT AAT C TATTATATCT ATAGTAAAAA AGTCAATGTA 

Serotype V CCATGTTTTG TAT GAT AAT C TATTATATCT ATAGTAAAAA GATAATCATA 

Serotype la AATATAGTTC CAGGTTTGCT TTTAAT • TTT TACTAATATT GGTAGGAAAG 

Serotype lb AATGTGATTT TAGGTTTGTT TCTAAT , TCT TATTAGCATT AT CAAGGAAG 

Serotype III CTTTATTTTT CTGCCTTTAT ACTTTTATAT AAAGAAGCGA TTTCAAAAAA 

Serotype VI CTTTTCTTTT CAATATTACT TTTTCTATAT AGAGAAGCTA TCAAACAAAA 

Consensus **- * — * — +_ + _* — . 

1551 1600 

Serotype IV GTTGAGCTCC AGATACTTTT GTTTA TA 

Serotype V ATTGAACTTC AACTACTCCT ATTTA TA 

Serotype la CTAAACAATC AGCTTTTTAT TATGAGATAG TAGGAACACT TATAACTTTA 

Serotype lb CTAAAAAGT C AGATTTCTAT TATGAGATAG TAGGGTCTGT CATACTCCTA 

Serotype III TTATAAAATC TACAGATTAT TTT T TTATACGTTA 

Serotype VT CAGGATAATC TACAAGCTTT TTT • . .T TGGATTGTTA 

Consensus * * * — * * + 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1601 

ATGTCTATAG 
AT GT CTATAA 
TTCTCATTTT 
TTTTCATTTT 
TT AT GT T ACA 
TTATTGTATA 



TATTATTTAC 

TATTATTTAC 

TTGCACTTGA 

TTGCACTTGA 

CGCTCTTTGA 

TGGTATTTGA 
*★ 



AGAGAGTTTT 
TGAAAGTTTT 
AGATCTTGAC 
AGATATTGAT 
GGAAATAGAT 
AGAATTTGAT 



TACCCAAGTA 
TATCCCAGTG 
GGAGCTAATT 
GGCGCCAATT 
CCTAATCATT 
CCTAATCATT 



1650 
TAGTT AT GAA 
TGGTAATGAA 
GGCTTATTGT 
GGCTCATTAT 
GGAGTATT GT 
GGAGTGTTGT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1651 

TATTAGTTGG 
TATTAGTTGG 
TTTTATTTTT 
TTTTGTCTTT 
ATT ATTATT C 
ATT GTTATTT 



ATGGTTTTTG 
CTAGTTTTTG 
ACAGTGTTAG 
ACAGTGTTGG 
TCAACTTTTG 
ACTACATTAG 



GGAAAATATT 
GTAAAATATT 
GAATTTTAGA 
GAATTTTAGA 
GTATAGT GGG 
GTATAGTAGG 



TTGTGGGGGT 
TTGTGATGGT 
AAATAAGGAT 
AAATAAGGAT 
AAGGGCTAA . 
GAGAG. GGA. 



1700 
GTAGATGATT 
ATCGAACCTA 
TTTTATAGTC 
TTCTATAGTC 



1*701 1750 

Serotype IV TACAAC GAGAGTT CACTT GGACG GCAAATAAAA ATTAGTGTAA 

Serotype V TAAAAA. ; AGGAATT TACT. . .ATT GT GAATAAT A TATGACATAT 

Serotype la AACT TAAAAG GTGGAAAAGT TAATGGAAAA ACGAATACTT GTTTCTATCA 

Serotype lb AACTTAAAAG GTGGGAAAGT TAATGGAAAA ACAAATACTT GTTTCTATCG 

Serotype III ♦ AAAAT GAAAGAAAAA GTAACAGTCA 

Serotype VI ATGAT AAAAAAACTA GTTAGTGTGA 

Consensus. * — t * * — 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1751 

TTGTACCAGT 
TTGCTCTGAT 
TTATACCTAT 
TTATACCTAT 
TTATACCTAT 
TTGTTCCAGT 



ATATAATTCG 
AT GGCAGGAG 
AT AC AACT CA 
AT ACAACT CG 
AT ACAACT CA 
TTATAATTCG 



AAACAATATT 
GTAAGGAAGG 
GAAGCATACC 
GAAGCATATC 
GAAGCATACC 
GAGTTAGTGA 



TAATAGCTTG 
AAAATGATAC 
TTAAAGAATG 
TTAAAGAATG 
TTAAAGAATG 
TTGAGAACTG 



1800 
CGTTGATTCA 
CTAAAGTTAT 
TGTGCAATCC 
CGTGCAATCC 
TGTGCAATCC 
TGTAGAATCT 



1801 

ATTAGAAAAC 
ACATTATTGT 
GTACTACAAC 
GTCCTACAAC 
GTACTACAAC 
TTGCTTCAAC 



AAACATATAA 
TGGTTTGGAG 
AGACT CAT CC 
AGACT CATTC 
AGACT CAT CC 
AAACATACCC 



GAATTTGGAA 
GAAATCCCTT 
ATT GATAGAA 
ATT GATAGAA 
ATTGATAGAA 
AGAAATAGAA 



cpsl/M 

ATTATTCTTG 
AC CAGATAAT 
GTTATACTAA 
GTTATACTGA 
GTTATACTAA 
ATTTTATTAA 



1850 
TTAATGATGG 
TTAAAGAAAT 
TT GAT GAT GG 
TTAATGATGG 
TT GAT GAT GG 
TAGAT GAT GG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1851 

AT C AACAG AT 
ATATAAAAA. 
AT C CACT GAT 
AT CCACT GAT 
AT C CACT GAT 
AT CTACAGAT 



GGTAGTAAAG 
. . . CTTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AAAAGTAGTC 



AGTTATGTGA 
AGAACAATGT 
AAATTTGTGA 
AAATTTGTGA 
AAATTTGTGA 
ATATTTGTAA 



GGAGATAAGA 
CCGGATTATG 
TAATTTATCT 
TAATTTATCT 
TAATTTATCT 
TAATTTTTTA 



1900 
AAAT C AG AT G 
AAATTATTGA 
CAAGAAGATA 
CAAAAAGACG 
CAGGAAGATA 
AAAAGGGATA 



1901 1950 

Serotype IV AAAGAATTAA GACATTT CAC AAAACAAATG GAGGACAATC AAGCGCAAGG 

Serotype V AT G GAAT GAG CATAATTATG AT GT TAGTAA AAATGTTTTT AT GAGAGAAG 

Serotype la AT C GCAT ACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype lb AT C GCAT ACT TGTATTTCAT AAAAAAAATG GAGGGGTATC TTCGGCAAGG 

Serotype III AT C GCAT ACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype VI GTCGCGTAAA AGT CT AT CAT AAAT ACAAT G GAGGTGCATC AT CAGCAAGA 

Consensus *__* * + _ _* — *_ ★ — * — 

1951 2000 

Serotype IV AATTTAGGTA TTTTATACTC TACAGGAGAT TTGATTGGTT TTGTTGACAG 

Serotype V CATATACTAA GAAGAATTT TGCT TATGTTTCTG ACTATGCAAG 

Serotype la AACCTAGGTC TAGAT AAAT C CACAGGAGAA T T C AT AACAT TTGTGGATAG 

Serotype lb AACCTAGGTC TT GAT AAAT C CACAGGCGAA TT CATAACGT TT GTAGATAG 

Serotype III AACCTAGGTC TAGAT AAAT C CACAGGAGAA TT CAT AACAT TTGTGGATAG 

Serotype VI AATGTGGGAC TTGAGATGGC AGAAGGT GAA TTTATAACTT TT GTAGATAG 

Consensus -* — * ★ — + ★ * — *★ 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2001 

CGACGATACA 
ATT GGATATT 
T GAT GATTTT 
TGATGATTTT 
TGATGATTTT 
CGATGATGTT 



ATTGACCCTA 
ATTTATACTT 
GTAGCACCGA 
GTAGCACCGA 
GTAGCACCGA 
GTCGCACTAA 



AAAT GT AT GA 
ATGGGGGGTT 
AT AT GATT GA 
ATATAATT GA 
AT AT GATT GA 
AT AT GATT GA 



AAC GTTACTA 
CTATCTAGAT 
AATAAT GTT A 
AATAATGTTA 
AATAATGTTA 
AATTATGCTG 



2050 
AAT AT AT AT G 
ACT GAT GT GG 
AAAAATTTAA 
AAAAATTTAA 
AAAAATTTAA 
AATAATTTGT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype. Ill 
Serotype VI 
Consensus 



1751 

TTGTACCAGT 
TTGCTCTGAT 
TTATACCTAT 
TTATACCTAT 
TTATACCTAT 
TTGTTCCAGT 



ATATAATTCG 
AT GGCAGGAG 
AT ACAACT CA 
AT ACAACT CG 
AT ACAACT CA 
TTATAATTCG 



AAACAATATT 
GTAAGGAAGG 
GAAGCATACC 
GAAGCAT AT C 
GAAGCATACC 
GAGTTAGTGA 



TAATAGCTTG 
AAAATGATAC 
TTAAAGAATG 
TTAAAGAATG 
TTAAAGAATG 
TTGAGAACTG 



1800 
CGTTGATTCA 
CTAAAGT TAT 
TGTGCAATCC 
CGTGCAATCC 
TGTGCAATCC 
TGTAGAATCT 



1801 

ATTAGAAAAC 
ACATTATTGT 
GTACTACAAC 
GTCCTACAAC 
GTACTACAAC 
TTGCTTCAAC 



AAACATATAA 
TGGTTTGGAG 
AGACT CAT CC 
AGACTCATTC 
AGACT CAT CC 
AAACATAC CC 



GAATTTGGAA 
GAAATCCCTT 
ATT GAT AGAA 
ATTGATAGAA 
ATT GAT AGAA 
AGAAATAGAA 



cpsl/M 

ATTATTCTTG 
AC CAGAT AAT 
GTTATACTAA 
GTTATACTGA 
GTTATACTAA 
ATTTTATTAA 



1850 
TTAATGATGG 
TTAAAGAAAT 
TTGATGATGG 
TTAATGATGG 
TTGATGATGG 
TAGAT GAT GG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1851 

AT C AACAG AT 
ATATAAAAA. 
ATCCACTGAT 
ATCCACTGAT 
ATCCACTGAT 
AT CTACAGAT 



GGTAGTAAAG 
• . . CTTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AAAAGTAGTC 



AGTT AT GTGA 
AGAACAATGT 
AAATTTGTGA 
AAATTTGTGA 
AAATTTGTGA 
ATATTT GTAA 



GGAGATAAGA 
CCGGATTATG 
TAATTTATCT 
TAATTTATCT 
TAATTTATCT 
TAATTTTTTA 



1900 
AAAT CAGAT G 
AAATTAT TGA 
CAAGAAGATA 
CAAAAAGACG 
CAGGAAGATA 
AAAAGGGATA 



1901 1950 

Serotype IV AAAGAATTAA GACATTT CAC AAAACAAATG GAGGACAATC AAGCGCAAGG 

Serotype V ATGGAATGAG CATAATTATG AT GTTAGTAA AAATGTTTTT AT GAGAGAAG 

Serotype la AT C GCAT ACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype lb ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTATC TTCGGCAAGG 

Serotype III ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype VI GTCGCGTAAA AGTCTATCAT AAAT AC AAT G GAGGT GCAT C AT CAGCAAGA 

Consensus * — * * * *- * — ★ — 

1951 2000 

Serotype IV ^AATTTAGGTA TTTTATACTC TACAGGAGAT TTGATTGGTT TTGTTGACAG 

Serotype V CATATACTAA GAAGAATTT TGCT TATGTTTCTG ACTAT GCAAG 

Serotype la AACCTAGGTC TAGAT AAAT C CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype lb AACCTAGGTC TTGATAAATC CACAGGCGAA TTCATAACGT TTGTAGATAG 

Serotype III AACCTAGGTC TAGAT AAAT C CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype VI AATGTGGGAC TTGAGATGGC AGAAGGT GAA TTTATAACTT TTGTAGATAG 

Consensus -* — * * — * * .★__★ + 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2001 

CGACGATACA 
ATTGGATATT 
TGATGATTTT 
TGATGATTTT 
TGATGATTTT 
CGATGATGTT 




ATTGACCCTA 
ATTTATACTT 
GTAGCACCGA 
GTAGCACCGA 
GTAGCACCGA 
GTCGCACTAA 



AAAT GT AT GA 
ATGGGGGGTT 
AT AT GATT GA 
ATATAATT GA 
AT AT GATT GA 
AT AT GATT GA 



AACGTTACTA 
CTAT CTAGAT 
AAT AAT GTTA 
AATAATGTTA 
AAT AAT GTTA 
AATTATGCTG 



2050 
AAT AT AT AT G 
ACT GAT GT GG 
AAAAATTTAA 
AAAAATTTAA 
AAAAATTTAA 
AATAATTT GT 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2051 

AAGAT GAACA 
AGCTTTTAAA 
TCACTGAGAA 
T CACT GAGGA 
TCACTGAGAA 
TAACGGAGAA 



AGTAGACTGG 
AAGTTTAGAT 
T GCT GAT AT A 
TGCTGATATA 
T GCT GAT AT A 
C GCAGAT AT A 



GTGCAAT GTA 
CCTTTGAGGA 
GCAGAAGTAG 
GCAGAAGTAG 
GCAGAAGTAG 
TCAGAAATTG 



ATCACAAAAA 
TTCATGAGTG 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTCGA. . . 



2100 
AATTTACTCT 
TTTTCTAGCA 
TATTT CGAAT 
TATTTCGAAT 
TATTTCGAAT 
AGTTT CAGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2101 

AACGGTGTTA 
AGGGAGATTA 
GAGAGAGATT 
GAGAGAGATT 
GAGAGAGATT 
GA. . .TTTTT 



ACTTATATTA 
GTTGTGATGT 
ATAGAAAGAA 
ATAGAAAGAA 
ATAGAAAGAA 
ATAAAAGAAA 



TAAT GGACCT 
GAATACAGGA 
GAAAAG AC G A 
AAAAAGACGA 
GAAAAGACGA 
AAAAAGAAAA 



GAATACTATA 
TTAATAATTG 
AACTTTTATA 
AACTTTTATA 
AACTTTTATA 
GGTTACTATA 



2150 
ATGTGCTTAA 
GCGCTGTTAA 
AAGTCTTTAA 
AGGTCTTTAA 
AAGTTTTTAA 
GAGTTTTTCA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2151 

TAAACAAGAT 
AGGACATCAC 
AAACAATAAC 
AAACAATAAT 
GAATAATAAC 
AAACAATAAG 



TTCCTATACG 
TTTTTAAAAT 
TCTTTAAAAG 
TCTTTAAAAG 
TCTTTGAAAG 
TCTCTCAAAG 



AATTTCTGAG 
CAAATATGTC 
AATTTTTAT C 
AATTTTTATC 
AATTTTTAT C 
AATTTTTTTC 



TACAAATAAG 
TAT AT AT GAC 
AGGCAATAGA 
AGGTAATAGA 
AGGTAATAGA 
AGGAAATAAA 



2200 
ATTTTTAGTT 
AAAAGT GAT T 
GTGGAAAATA 
GTGGAAAATA 
GTGGAAAATA 
GTAGAAAATG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2201 

CAGTCTGCGA 
TAACTTCTCT 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGGGG 



GGGGTTGTTA 
TAATAAGACA 
AAAATTATAT 
AAAATTATAT 
AAAATTATAT 
GAAATTATAT 



TCTAGAGATT 
TGTGTAGAGG 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGCA 



TAGCTTTAAA 
TTACAACTAA 
TAATTGGCAA 
TAATT GGTAA 
TAATTGGTAA 
TTATTGGGGA 



2250 
AATAAAATTC 
TTT ATT GAT A 
CTTGAGGTTT 
CTTGAGGTTT 
CTTGAGGTTT 
TTTACGATTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2251 

CGTGAAGAAA 
AACAGAGGGC 
GATGAGAACT 
GATGAGAATT 
GATGAGAACT 
AATGAAAAAT 



AAAAAT . . .A 
TTAAGA. . .A 
TAAAAATTGG 
TAAAAATTGG 
TAAAAATTGG 
ACAAAATT G G 



TGAAGATACA 
TAAGAATATT 
TGAGGATTTA 
TGAGGATTTA 
TGAGGATTTA 
TGAAGACTTG 



CAGTTTTATT 
ATTCAAAAGA 
CTTTTTAATT 
CTTTTTAATT 
CTTTTTAATT 
CTATTTAACT 



2300 
TT GAT CT CAT 
TTGA. . TGAT 
GTAAACTCTT 
GTAAAATTTT 
GCAAACT CTT 
TT CAGATTT T 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2301 

AAAAAATGCT 
ATAACAATAT 
ATGTCAAGAG 
ATGTCAAGAG 
ATGTCAAGAG 
AAATAAAGAA 



AATAAGTTTG 
AT C CGAGAAA 
CACCGTATAG 
CACT GCATAG 
CACCGTATAG 
CAT CGT ATAG 



TTATTATAAG 
TTATTTTAAT 
TCGTAGATAC 
TCGTAGATAC 
TCGTAGATAC 
TTGTAGATAC 



CCAACCTTTT 
CCAAAGAATT 
GACTTCTTCC 
GACTTCTTCC 
GACTTCTTCC 
T AGAAGAT C A 



2350 
TATAATTACT 
TATTAACA. . 
TTATATACTT 
TTGTACACCT 
TTATATACTT 
CTCTATACTT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2051 

AAG AT G AAC A 
AGCTTTTAAA 
TCACTGAGAA 
TCACTGAGGA 
TCACTGAGAA 
TAACGGAGAA 



AGTAGACTGG 
AAGTTTAGAT 
TGCTGATATA 
TGCTGATATA 
TGCTGATATA 
CGCAGATATA 



GTGCAATGTA 
CCTTTGAGGA 
GCAGAAGTAG 
GCAGAAGTAG 
GCAGAAGTAG 
T C AG AAAT T G 



ATCACAAAAA 
TTCATGAGTG 

ATTTTGA 

ATTTTGA. . . 
ATTTTGA. . . 
ATTTCGA. . , 



2100 
AATTTACTCT 
TTTTCTAGCA 
TATTTCGAAT 
TATTT CGAAT 
TATTTCGAAT 
AGTTTCAGAT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2101 

AACGGTGTTA 
AGGGAGATTA 
GAGAGAGATT 
GAGAGAGATT 
GAGAGAGATT 
GA, . .TTTTT 
*_ 



ACTTATATTA 
GTTGTGATGT 
ATAGAAAGAA 
ATAGAAAGAA 
ATAGAAAGAA 
ATAAAAGAAA 



TAAT GGACCT 
GAATACAGGA 
GAAAAGACGA 
AAAAAGACGA 
GAAAAGACGA 
AAAAAGAAAA 



GAATACTATA 
TTAATAATTG 
AACTTTTATA 
AACTTTTATA 
AACTTTTATA 
GGTTACTATA 



2150 
AT GT GCTTAA 
GCGCTGTTAA 
AAGTCTTTAA 
AGGTCTTTAA 
AAGTTTTTAA 
GAGTTTTTCA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2151 

TAAACAAGAT 
AGGACATCAC 
AAACAATAAC 
AAACAATAAT 
GAATAATAAC 
AAACAATAAG 



TTCCTATACG 
TTTTTAAAAT 
TCTTTAAAAG 
TCTTTAAAAG 
TCTTTGAAAG 
TCTCTCAAAG 



AATTTCTGAG 
CAAATATGTC 
AATTTTTATC 
AATTTTTATC 
AATTTTTATC 
AATTTTTTTC 



TACAAATAAG 
TATATATGAC 
AGGCAATAGA 
AGGTAATAGA 
AGGTAATAGA 
AGGAAATAAA 



2200 
ATTTTTAGTT 
AAAAGT GATT 
GTGGAAAATA 
GTGGAAAATA 
GTGGAAAATA 
GT AGAAAAT G 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



2201 

CAGTCTGCGA 
TAACTTCTCT 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGGGG 



GGGGTTGTTA 
TAATAAGACA 
AAAATTATAT 
AAAATTATAT 
AAAATTATAT 
GAAATTATAT 



TCTAGAGATT 
TGTGTAGAGG 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGCA 



TAGCTTTAAA 
TTACAACTAA 
TAATTGGCAA 
TAATTGGTAA 
TAATT GGTAA 
TTATT GGGGA 



2250 
AATAAAATTC 
TTTATTGATA 
CTTGAGGTTT 
CTTGAGGTTT 
CTTGAGGTTT 
TTTACGATTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



2251* 

CGTGAAGAAA 
AACAGAGGGC 
GATGAGAACT 
GATGAGAATT 
GATGAGAACT 
AATGAAAAAT 



AAAAAT . . .A 
TTAAGA. . .A 
TAAAAATTGG 
TAAAAATTGG 
TAAAAATTGG 
ACAAAATT GG 



T GAAGAT AC A 
TAAGAATATT 
TGAGGATTTA 
TGAGGATTTA 
TGAGGATTTA 
TGAAGACTTG 



CAGTTTTATT 
ATTCAAAAGA 
CTTTTTAATT 
CTTTTTAATT 
CTTTTTAATT 
CTATT TAACT 



2300 
TTGATCTCAT 
TTGA. . TGAT 
GTAAACTCTT 
GTAAAATTTT 
GCAAACTCTT 
TTCAGATTTT 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2301 

AAAAAATGCT 
ATAACAATAT 
AT GT CAAGAG 
AT GT CAAGAG 
AT GT CAAGAG 
AAATAAAGAA 



AATAAGTTTG 
ATC C GAGAAA 
CACC GTAT AG 
CACT GCATAG 
CACCGTATAG 
CATCGTATAG 



TTATTATAAG 
TTATTTTAAT 
T C GTAGAT AC 
TCGTAGATAC 
TCGTAGATAC 
TT GTAGAT AC 
★ ★ 



CCAACCTTTT 
CCAAAGAATT 
GACTTCTTCC 
GACTTCTTCC 
GACTTCTTCC 
TAGAAGATCA 



2350 
TATAATTACT 
TATTAACA. . 
TTATATACTT 
TTGTACACCT 
TTATATACTT 
CTCTATACTT 



SUBSTITUTE SHEET (Rule 9.2) 



PCT/AU02/01281 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2351 

ACTACAGAAA 
. . GGTAAGGT 
ATCGAATTGT 
ATCGCATCGT 
ATCGAATTGT 
ATCGTATTGA 



AAATAGTACA 
TGATTGTCTG 
AAAAACTTCC 
AAAGACTTCT 
AAAAACTTCT 
AGAAAAATCT 



ACAACTTCCT 
ACT AGT GTTA 
GCAA.TGAAT 
GCAA.TGAAT 
GTAA.TGAAT 
ATAA.TGAAT 



CATATAGTAG 
CCTATTCTAT 
CAGAAATTCA 
CAGGAGTTCA 
CAGAAATTCA 
CAACAATTTA 



2400 
CTATCAATGG 
ACAT CATT AC 
ACGAAAACTC 
AC GAAAATT C 
ACGAAAACTC 
ATAAAAATAC 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2401 

GACATAAT CG 
GAAGGAAGTT 
AT TAGATTTT 
ATTAGATTTT 
ATTAGATTTT 
ATTAGACTTC 



AT AT CT GTAC 
GGAAAAGTTC 
AT AACAAT TT 
ATAACAAT TT 
ATAACAATTT 
ATT GAT ATT T 



TGAGTGTTAT 
TTCATTTATT 
TTAAT GAAGT 
TTAATGAAAT 
TTAAT GAAAT 
TTAATGAGAT 



TAT TAT GCAA 
TCAGATTCTC 
AAGTAGTTTG 
AAGCAGTATT 
AAGTAGTTTG 
T CAT CAGGAT 



2450 
AGGATTTTAA 
TAAAGATTAG 
GTTCCTGCCA 
GTTCCTGCAA 
GTTCCTGCCA 
AGT CCGACAG 



245 1 2500 

Serotype IV TGGATTTGAA GAAGTTGCTT TTTCAAGATT ATTTGGTGCA TATT CGTTAG 

Serotype V AGTAAGGCTC ATAATTGATT TTTTATTTGG ATAT GGTACT TATAGAATGC 

Serotype la AATT GGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 

Serotype lb AATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GGTAAAGTGT 

Serotype III GATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 

Serotype VI AATTGTTTAA TTATGTGGAA GCGAAGTTTG TACGAGAAAA AATCAAGTGT 

Consensus — ★ * * + 

2501 2550 

Serotype IV TAGCTAATAA AATT GT AT AT AATAAAGATT ATAGAAAAAC CGAAGAATTA 

Serotype V TTCTAAGGTT TCTAAAGTTA AAGAAATAG 

Serotype la CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAAT CAAAGT 

Serotype lb CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAGTA AAAT CAAATT 

Serotype III CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAAT CAAAGT 

Serotype VI TTAAGGAAAA TGTTTGAATT AGGAGAAATA GCTGATGAAA ATTTACGTTT 

Consensus — * ★ + 

2551 2600 

Serotype IV AGATAA 

Serotype V * 

Serotype la ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TT CTATAAAG 

Serotype lb ACAACGAGAG ATTTTTTTCA AAGAT GTTAA ATTAT AC C CT TT CTATAAAG 

Serotype III ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TT CTATAAAG 

Serotype VI ACAGAGATAT AAATTTTGGC AAGATATTAA AT CAT AT T C A AT AT GCAAAG 

Consensus : 

26 °1 2650 
Serotype IV 

Serotype V 

Serotype la CGGTAAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 

Serotype lb CGGTTAAGTA CTTATCATTA AAGGGATTAT TGAGTATTTA CTTAATGAAA 

Serotype III CGGTCAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 

Serotype VI CAATAAGGTT CTTATCTAAA AAACATATCT GTACGTTATA TTTGATGAAA 

Consensus • 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VT 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



22/25 

2351 2400 

ACTACAGAAA AAATAGTACA ACAACTTCCT CATATAGTAG CTATCAATGG 

. . GGTAAGGT TGATTGTCTG ACTAGTGTTA CCTATTCTAT ACATCATTAC 

AT CGAATT GT AAAAACTTCC GCAA.TGAAT CAGAAATTCA AC GAAAACT C 

ATCGCATCGT AAAGACTTCT GCAA.TGAAT CAGGAGTTCA ACGAAAATTC 

AT CGAATT GT AAAAACTTCT GTAA. TGAAT CAGAAATTCA AC GAAAACT C 

AT C GTATTGA AGAAAAATCT ATAA. TGAAT CAACAATTTA ATAAAAATAC 



2401 2450 

GACATAATCG ATATCTGTAC TGAGTGTTAT TATTAT GCAA AGGATTTTAA 

GAAGGAAGTT GGAAAAGTTC TTCATTTATT TCAGATTCTC T AAAGAT TAG 

ATTAGATTTT ATAACAATTT TTAAT GAAGT AAGTAGTTTG GTTCCTGCCA 

ATTAGATTTT ATAACAATTT TTAAT GAAAT AAGCAGTATT GTTCCTGCAA 

ATTAGATTTT ATAACAATTT TTAAT GAAAT AAGTAGTTTG GTTCCTGCCA 

ATTAGACTTC ATT GATATTT TTAAT GAGAT TCATCAGGAT AGTCCGACAG 



2451 2500 
TGGATTTGAA GAAGTT GCTT TTTCAAGATT ATTTGGTGCA TATTCGTTAG 
AGTAAGGCT C ATAATTGATT TTTTATTTGG AT AT GGTACT TATAGAATGC 
AATTGGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GGTAAAGT GT 
GATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTGTTTAA TTATGTGGAA GCGAAGTTTG TACGAGAAAA AAT CAAGTGT 
— + — ★ •*■ * * * 

2501 2550 
TAGCTAATAA AATT GTAT AT AATAAAGATT ATAGAAAAAC CGAAGAATTA 

TTCTAAGGTT TCTAAAGTTA AAGAAATAG 

CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATT GACAGTA AAATCAAATT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
TTAAGGAAAA TGTTTGAATT AGGAGAAATA GCTGATGAAA ATTTACGTTT 
— * ★ + 

2551 2600 
AGATAA 

ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTAT AAAG 
ACAACGAGAG ATTTTTTTCA AAGAT GTTAA ATTATACCCT TTCTATAAAG 
ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTATAAAG 
ACAGAGATAT AAATTTTGGC AAGATATTAA AT CAT ATT CA ATATGCAAAG 



2601 2650 



CGGTAAAATA CTTAT CATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
C GGTTAAGT A CTTATCATTA AAGGGATTAT TGAGTATTTA CTTAAT GAAA 
CGGTCAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
CAATAAGGTT CTTAT CTAAA AAACATATCT GT AC GT TATA TTT GAT GAAA 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2651 



2700 



TGTTCACCTA AACT AT AT GT 
TGTTCACCCA TCTTGTATAT 
TGTTCACCTA AACT AT AT GT 
TATTTTCCGT AC GT AT AT AT 



TAT GGCATAT AGAAGATTCA AAACAGTAGC 
AAAATTATAT GACAGGTTTC AAAAACAGTA 
TAT GGCATAT AG AAG AT T T C AAAAACAGTA 
AAAGATGTAT AATAAATTTC AAAAGCAATA 



2701 2728 

Serotype IV 

Serotype V ■ 

Serotype la T GGAGAAATT GGGAAAGAGA ATTTATAA 

Serotype lb A 

Serotype III G 

Serotype VI A 

Consensus 



Notes. 

Serotype la: GenBank accession number AB028896; 
Serotype lb: GenBank accession number AB050723; 
Serotype HI: GenBank accession number AF 163 83 3; 
Serotype IV: GenBank accession number AF355776; 
Serotype V: GenBank accession number AF349539; 
Serotype VI: GenBank accession number AF337958. 
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2651 2700 
Serotype IV • 

Serotype V . . . . . 

Serotype la TGTTCACCTA AACTATATGT TAT GGCATAT AGAAGATTCA AAACAGTAGC 

Serotype lb TGTTCACCCA TCTTGTATAT AAAATTATAT GACAGGTTTC AAAAACAGTA 

Serotype III TGTTCACCTA AACTATATGT TAT GGCATAT AGAAGATTTC AAAAACAGTA 

Serotype VI TATTTTCCGT ACGTATATAT AAAGAT GTAT AATAAATTT C AAAAGCAATA 

Consensus ■ . 

2701 2728 

Serotype IV 

Serotype V 

Serotype la TGGAGAAATT GGGAAAGAGA ATTTATAA 

Serotype lb A 

Serotype III G 

Serotype VT A 

Consensus 



Notes. 

Serotype la: GenBank accession number AB028896; 
Serotype lb: GenBank accession number AB050723; 
Serotype HE: GenBank accession number AF163833; 
Serotype IV: GenBank accession number AF355776; 
Serotype V: GenBank accession number AF349539; 
Serotype VI: GenBank accession number AF337958. 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, upper lines) and 
alp3 (AF291065, lower lines) used to distinguish them (relevant primers are shown). 

251 AAGGTAATCTTAATATTTTTGAAGAGTCAATAGTTGCTGCATCTACAATT 300 

I 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 

531 A AGGTAAT CTTAAT ATTTTT GAAGAGT CAAT AGTT GCT GCAT C TACAATT 580 

bcaSl . 

301 CCAGGGAGTGCAGCGACCTTAAATACAAGCATCACTAAAAATATACAAAA 350 

M I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | 
58 1 CCAGGGAGT GCAGC GAGCTTAAAT ACAAGCAT CA CTAAAAATAT ACAAAA 630 

bcaS2 

351 CGGAAACGCTTACATAGATTTATAT GAT GTAAAGAAT GGATT GATTGAT C 4 00 

I I I I I 1*1 I I I I I I I I I I I I I I I I.I I I I I I I I I I I I I I I I I I I I |*| I I I 
631 CGGAAATGCTTACATAGATTTATATGATGTAAAGAATGGATTGATC GATC 680 
• 

401 CTCAAAACCTCATTGTATTAAATCCATCAAGCTATTCAGCAAATTATTAT 450 
I I I I I I M I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | 

681 CTCAAAACCTCATTGTATTAAATCCATCAAGCTATTCA GCAAATTATTAT 730 

balS . . . 

451 AT CAAACAAGGT GCTAAAT ATT ATAGTAATCCGAGT GAAATTACAACAAC 500 
I I I I I I I I I I I I I ! I I ! I I I I I I ! I I I I I I I I I 1 I I I I I I I I I I I I I I I I 

731 AT CAAACAAGGT GCTAAAT ATTATAGTAAT CCGAGT GAAATTACAACAAC 780 

501 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 550 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
781 T GGTT CAGCAACT ATTACTTTTAAT AT ACTT GAT GAAACT GGAAAT CCAC 830 

• • • • • 

551 ATAAAAAAGCT GAT GGACAAATT GAT AT AGTT AGT GT GAATTTAACTATA 600 

I I I I I I I I I M I I I I I I I I I III I I | | | | | | | | | | | | | | | | | | | | | | | | | 
831 ATAAAAAAGCT GAT GGACAAATT GAT ATAGTTAGT GTGAATTTAACTATA 8 80 

• • • • 

.601 TAT GATT CT ACAGCTT TAAGAAAT AGGAT AGAT GAAGTAATAAATAAT GC 650 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 

881 TAT GATT CTACAGCTTTAAGAAAT AGGAT AGAT GAAGTAATAAATAAT GC 930 

• • • • • 

651 AAATGATCCTAAGTGGAGTGATGGGAGTCGTGATGAAGTCTTAACTGGAT 700 

I I M I IN I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
931 AAAT GATCCTAAGT GG AGT GAT GGGAGT CGT GATGAAGT CTTAACTGGA T 980 

balA 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, upper lines) and 
alp3 (AF291065, lower lines) used to distinguish them (relevant primers are shown). 

251 AAGGTAAT CTTAAT ATTTTT GAAGAGT CAATAGTTGCT GCAT CTACAATT 300 

. I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | f | | | | | | | | | | | , | ,, | , 
531 A AGGTAAT CTTAAT ATTTTT GAAGAGT CAATAGTTGCT GCAT C TACAATT 580 

bcaSl . . 
301 CCAGGGAGTGCAGCGACCTTAAATACAAGCATCACTAAAAATATACAAAA 350 

I I I I I N I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1 | | | | 
581 CCAGGGAGT GCAGCGACCTTAAATACAAGCAT CA CTAAAAATATACAAAA 630 

bcaS2 . 

351 CGGAAACGCTTACATAGATTTATATGATGTAAAGAATGGATTGATTGATC 400 

I I I I I I * I I I M"| I I I | | | | | I I I I I I I I I I I I I I I I I I * | I I | 

631 CGGAAATGCTTACATAGATTTATAT GAT GTAAAGAATGGATT GATC GATC 680 

* * « • • 

4 01 CT CAAAACCT CATTGTATTAAAT CCAT CAAGCTATT CAGCAAATTATTAT 450 
I I i I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

681 CT CAAAACCT CATT GTATTAAAT CCAT CAAGCTATT CA GCAAATTATTAT 730 

balS . . 

451 AT CAAACAAGGTGCTAAATATT ATAGTAAT CCGAGT GAAATTACAACAAC 500 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

731 ATCAAACAAGGTGCTAAATATTATAGTAATCCGAGTGAAATTACAACAAC 78 0 

501 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 550 

I I I I I I I I I I I I I I I 1 I N I I I I I I I I I I I I I | | | | | 1 I I I I I I I I I I I I 

781 T GGTT CAGCAACTATTACTTTTAATATACTTGAT GAAACT GGAAAT CCAC 830 

551 ATAAAAAAGCT GAT GGACAAATTGATATAGTTAGTGT GAATTTAACTATA 600 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

831 ATAAAAAAGCTGAT GGACAAATTGATATAGTTAGT GTGAATTTAACTATA 880 

• • • • • 

601 TATGATTCTACAGCTTTAAGAAATAGGATAGAT GAAGTAATAAATAAT GC 650 

I I I I I I I I I I II I I | | I I I I 1 

881 TATGATTCTACAGCTTTAAGAAATAGGATAGATGAAGTAATAAATAATGC 930 

651 AAATGATC CTAAGT GGAGTGAT GGGAGT CGT GAT GAAGTCTTAACTGGAT 700 

I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

931 AAAT GAT CCTAAGT GG AGT GAT GGGAGT CGT GAT GAAGTCTTAACTGGA T 98 0 

balA 
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Genetic markers of GBS genotypes Strain No. (%) Genotypes 



V-alp3-IS7J«-ISSa4 

V-alp3-ISiiW-IS5<yy-ISSa^ 

V-alp3-IS7,W 

V-alp3 

V-alp3-ISiJW-IS5(J7<5BSa 
Ib-alp3-IS7J$7 
V-None-GBSil 
ni-aIp3-IS2J*i-GBSil 
n-alp3-ISi5W-IS/5-/5-GBSU 
n-a!p3-IS7iW-IS75^ 
II-alp3-IS7J&r-ISl67 



15 



20 



25 



II-a-ISiJW-ISSa<M3BSil 

VI-as~ISiJ« 

VI-a-IS/JM 

IV-a-ISiJW 

iy-a-IS/J«-IS*tf/ 

Ia-as-IS75*2 

Ia-a-IS2tti-Isa6~7 

Ia-a-ISiJM 

Ia-None-ISiiM 

Ia-a-ISJJ&MSSa4 

Ia-AaB5a-IS/3&MS567 

Ia-a-ISfldV-GBSil 



2(1) 
I (0.5) 
27 +1 (14) 
I (0.5) 
1 (0.5) 
1(0 J) 
1(05) 
1 (0.5) 
1 (0.5) 
1(0.5) 
1 (0-5) 



2(1) 
1 (0.5) 
1 (0.5) 
5(3) 
1 (0.5) 
3(2) 
3(2) 

34+1 (18) 
1 (0.5) 
1 (0.5) 
1 (0.5) 
1(0-5) 



II-Aa-IS7J*7-IS*67-ISSa4 2(1) 

II-AaB9~ISi3*i-IS**/-ISSa4 1 (0.5) 

n-AaB4-ISiJ*/-IS6*2-ISSa4 I (0.5) 

n-AaBl-ISiJ*J-IS*5AIS754* 2 (1) 

H-Aa-1S1381-lS861-lS!S48-GBSil 1+1 (1) 

n-A*B4-lS13 81-lS861-lS1548 1 (0.5) 



VI-AaB3b-IS7JS7-IS*67 
VI-Aa-IS25AMS*61-GBSU 
VI-AaB7a-IS/J«-IS*67 
Ib-AaB3c~IS75S7 
Ib-AaB7-IS7J« 
Ib-AaB10-IS73£7-IS£6*7 
Ib-AaB10-ISl*W-ISS67-ISSa4 
Ib-AaBl-ISJJ&MS467 
Ib~AaBMS23&r 
Ib-AaB3-IS15£7-IS*6'7 
Ib-AaB4b-IS73*7-IS*tf7 
Ib-AaB8-IS75*7-IS*67 
Ib-AaB4-IS75S7-lS£67 
Ib-AaBla-IS75*i-IS*6r 
Ib-AaB2-IS75«7-IS<?67 
Ib-AaBNl-IS/3*7-IS£6i 
Ib-AaBN2-ISU*7-IS*67 
Ib-AaB9a-IS7357-IS^(J7 



m-4-AaB2-IS7567-IS$67-GBSil 



+1 (0.5) 
1(0.5) 
1 (0.5) 
1(0.5) 
1(0 J) 
1+1 (I) 
1 (0.5) 
13+2(8) 
1(0.5) 
2(1) 
1+1(1) 
1 (0.5) 
1 (0.5) 
1(0.5) 
1 (0.5) 
1(0.5) 
1 (0.5) 
1 (0.5) 



1 (0-5) 



U1~R-IS1381-IS861-1S1548 
V-KBln-lS1381-\S861 



m-alp2as 
Ia-alp2as 
n-alD2as-GBSil 



22+4(13) 
1 (0-5) 



3+1 (2) 
1 (0.5) 



V-2 

V-3 

V-l 

V-7 

V-4 

lb- 14 

V-6 

ni-4-2 
n-io 
n-8 
n-9 



n-4 

VI-4 
VI-5 
IV-1 

rv-2 

Ia-2 
la-3 
Ia-1 
Ia-8 
Ia-4 
Ia-6 
Ia-5 



H-2 

n-7 

II-5 

n-3 

11-12 
H-6 



VI-2 

VI-3 

VI-1 

Ib-6 

Ib-9 

Ib-13 

Ib-12 

Ib-l 

Ib-2 

Ib-5 

Ib-8 

Ib-10 

Ib-7 

Ib-3 

I)>4 

Ib-15 

Ib-16 

Ib-li 



UI-4-1 



HI-R-IS*6J-GBSU 13+4(9) fflT" 

H-R-IS/JSy-IS^-GBSil 4(2) D-l 

JV-R-IS7357-ISWi-GBSfl 1(03) IV-3 



IT 
V-5 



m-3 

Ia-7 
n.i i 



->2 
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Genetic markers of GBS genotypes 

V-aIp3-IS/Jo7-ISSa4 

V-alpS-IS/JM-ISoW-ISSa* 

V-alp3-ISJJo7 

V-alp3 

V-alp3-IS/J«-IS56i-GBSa 
Ib-alp3-IS/JW 
V-None-GBSil 
m-alj>3-IS/Jo7-GBSil 
II-alp3-IS2JW-lS7545-GBSU 
D-aIp3-IS2JW-IS75^ 
II-alp3-IS/3o7-ISo>67 



Strain No. (%) Genotypes 



10 



15 20 25 



n-a-ISiJ67-ISSa¥-GBSil 
VI-as-ISJJM 
Vl-a-IS/iW 
IV-a-ISJ5W 
tV-i-TS1381-iS861 
Ia-as-ISJ55I 
Ia-a-ISIJJi-ISoW 
Ia-a-ISiSW 
Ia-None-IS23o7 
la-a-ISiJW-ISSa* 
Ia-AaB5a-IS/557-IS5<5/ 

Ia-a-ISl67-GBSil ' 

H-Aa-I&i5#i-IS«d7-ISSa4 
n-Aa39-lS25o7-IS*67-ISSa4 
n-AaB4-IS2Jo7-IS*67-ISSa4 
n-AaBMSiJ57-I&S6/-IS75^ 
II-Aa-IS7 Jtfi-IS^rfi-IS /54S-GBSU 



2(1) 
1 (0.5) 
27 +1 (14) 
I (0-5) 
1 (0.5) 
1 (0.5) 
1(0.5) 
1 (0.5) 
1 (0.5) 
1 (0.5) 
.1(0.5) 



TOT 

1 (0.5) 

1(0.5) 

5(3) 

1 (0.5) 

3(2) 

3(2) 

34+1 (18) 
1 (0.5) 
1 (0.5) 
1 (0.5) 
1(0.5) 



2(1) 
1 (0.5) 
1(0.5) 
2(1) 
1+1 (I) 



V-2 

V-3 

V-l 

V-7 

V-4 

lb- 14 

V-6 

m-4-2 

n-io 

n-8 

n-9 



n-4 

VI-4 
VI-5 
IV-1 
IV-2 
Ia-2 
Ia-3 
Ia-1 
Ia-8 
la-4 
Ia-6 
Ia-5 



H-2 

n-5 
n-3 
n-i2 



VI~AaB3b-ISiJ*i-iSS67 


+1(0.5) 


VI-2 


YI-Aa-lS23M-IS*d7-GBSfl 


1(0.5) 


VI-3 


VI-AaB7a-IS/367-IS*67 


1(0.5) 


VI- 1 


Ib-AaB3o-IS/JW 


1(0.5) 


Ib-6 


Ib-AaB7-ISiJW 


1(0-5) 


Ib-9 


Xb-A*B\QAS1381-lSS61 


1+1 (1) 


Ib-13 


Ib-AAB10-IS15£7-IS*6\MSSa4 


I (0.5) 


Ib-12 


Ib-AaBl-IS/5o7-IS467 


13+2(8) 


Ib-1 


Ib-AaBMSi5W 


1 (0.5) 


Ib-2 


Ib-A*B5ASI381-lS86I 


2(1) 


Ib-5 


Ib-AaB4b-IS25*i-IS*di 


1+1 (1) 


Ib-8 


Ib-AaB8-IS73«7-IS*67 


1(0.5) 


Ib-10 


lb-A*B4-iS1381-lS861 


1(0.5) 


Ib-7 


Ib-AaB 1 &~1&1381-1S861 


1(0.5) 


Ib-3 


Ib-AaB2-IS7J$7-IS*67 


1 (0.5) 


Ib-4 


Ib-AaBNl-IS2J67-IStf67 


1(0.5) 


Ib-15 


Ib-AaBN2-ISJ367-ISd , 67 


1(0.5) 


Ib-16 


Ib-AaB9a-IS25£7-IS*57 


1 (0.5) 


Ib-li 


ra-4-AaB2-IS7ia/-IS5W-OBSU 


1(0.5) 


IU-4-1 


HI-R-ISW/-GBSU 


13+4(9) 


ra-2 


n-R-IS73$7-IS*61-GBSfl 


4(2) 


n-i 


IV-R-IS7557JS55i-GBSa 


1(03) 


IV-3 


in-R-IS/5W-IS^6i-IS75^ 


22+4 (13) 


ra-i 


V-RB3a-ISiJo7-IS*o7 


I (OS) 


V-5 


m-alp2as 


3+1 (2) 


ffl-3 


Ia-alp2as 


I (0-5) 


Ia-7 


n-alp2as-GBSU 


1 (0.5) 


n-u 


TotaJ=56 genotypes 


Totol-177+17 


56 genotypes 
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